# Streaming

To enable streaming, you can set the `stream` parameter to `true` in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.

### Examples <a href="#examples" id="examples"></a>

Here is an example of how to stream a response, and process it:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

question = "How would you build the tallest building ever?"

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
  "Authorization": f"Bearer {{API_KEY}}",
  "Content-Type": "application/json"
}

payload = {
  "model": "google/gemini-2.5-flash",
  "messages": [{"role": "user", "content": question}],
  "stream": True
}

buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
  for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
    buffer += chunk
    while True:
      try:
        # Find the next complete SSE line
        line_end = buffer.find('\n')
        if line_end == -1:
          break

        line = buffer[:line_end].strip()
        buffer = buffer[line_end + 1:]

        if line.startswith('data: '):
          data = line[6:]
          if data == '[DONE]':
            break

          try:
            data_obj = json.loads(data)
            content = data_obj["choices"][0]["delta"].get("content")
            if content:
              print(content, end="", flush=True)
          except json.JSONDecodeError:
            pass
      except Exception:
        break
```

{% endtab %}

{% tab title="Typescript" %}

{% endtab %}

{% tab title="cURL" %}

```bash

curl -sN \
  -H "Authorization: Bearer YOUR-API-KEY" \
  -H "Content-Type: application/json" \
  -X POST "https://llm.onerouter.pro/v1/chat/completions" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "How would you build the tallest building ever?"}],
    "stream": true
  }' | while IFS= read -r line; do
      if [[ "$line" == data:* ]]; then
          data="${line#data: }"

          if [[ "$data" == "[DONE]" ]]; then
              break
          fi

          content=$(echo "$data" | jq -r '.choices[0].delta.content // empty' 2>/dev/null)
          if [[ -n "$content" ]]; then
              printf "%s" "$content"
          fi
      fi
  done
```

{% endtab %}
{% endtabs %}

### Additional Information <a href="#additional-information" id="additional-information"></a>

For SSE (Server-Sent Events) streams, Infron occasionally sends `comments` to prevent connection timeouts. These comments look like:

```
INFRONAI PROCESSING
```

Comment payload can be safely ignored per the [SSE specs](https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation). However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you `JSON.stringify` the non-JSON payloads. We recommend the following clients:

* [eventsource-parser](https://github.com/rexxars/eventsource-parser)
* [OpenAI SDK](https://www.npmjs.com/package/openai)
* [Vercel AI SDK](https://www.npmjs.com/package/ai)
