# Streaming

To enable streaming, you can set the `stream` parameter to `true` in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.

### Examples <a href="#examples" id="examples"></a>

Here is an example of how to stream a response, and process it:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

question = "How would you build the tallest building ever?"

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
  "Authorization": f"Bearer {{API_KEY}}",
  "Content-Type": "application/json"
}

payload = {
  "model": "google/gemini-2.5-flash",
  "messages": [{"role": "user", "content": question}],
  "stream": True
}

buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
  for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
    buffer += chunk
    while True:
      try:
        # Find the next complete SSE line
        line_end = buffer.find('\n')
        if line_end == -1:
          break

        line = buffer[:line_end].strip()
        buffer = buffer[line_end + 1:]

        if line.startswith('data: '):
          data = line[6:]
          if data == '[DONE]':
            break

          try:
            data_obj = json.loads(data)
            content = data_obj["choices"][0]["delta"].get("content")
            if content:
              print(content, end="", flush=True)
          except json.JSONDecodeError:
            pass
      except Exception:
        break
```

{% endtab %}

{% tab title="Typescript" %}

{% endtab %}

{% tab title="cURL" %}

```bash

curl -sN \
  -H "Authorization: Bearer YOUR-API-KEY" \
  -H "Content-Type: application/json" \
  -X POST "https://llm.onerouter.pro/v1/chat/completions" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "How would you build the tallest building ever?"}],
    "stream": true
  }' | while IFS= read -r line; do
      if [[ "$line" == data:* ]]; then
          data="${line#data: }"

          if [[ "$data" == "[DONE]" ]]; then
              break
          fi

          content=$(echo "$data" | jq -r '.choices[0].delta.content // empty' 2>/dev/null)
          if [[ -n "$content" ]]; then
              printf "%s" "$content"
          fi
      fi
  done
```

{% endtab %}
{% endtabs %}

### Additional Information <a href="#additional-information" id="additional-information"></a>

For SSE (Server-Sent Events) streams, Infron occasionally sends `comments` to prevent connection timeouts. These comments look like:

```
INFRONAI PROCESSING
```

Comment payload can be safely ignored per the [SSE specs](https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation). However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you `JSON.stringify` the non-JSON payloads. We recommend the following clients:

* [eventsource-parser](https://github.com/rexxars/eventsource-parser)
* [OpenAI SDK](https://www.npmjs.com/package/openai)
* [Vercel AI SDK](https://www.npmjs.com/package/ai)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://infronai.gitbook.io/docs/llm-apis/api-guides/streaming.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
