# Streaming

The Infron AI API allows streaming responses from *any model*. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.

To enable streaming, you can set the `stream` parameter to `true` in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.

## Examples

Here is an example of how to stream a response, and process it:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

question = "How would you build the tallest building ever?"

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
  "Authorization": f"Bearer {{API_KEY}}",
  "Content-Type": "application/json"
}

payload = {
  "model": "{{MODEL}}",
  "messages": [{"role": "user", "content": question}],
  "stream": True
}

buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
  for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
    buffer += chunk
    while True:
      try:
        # Find the next complete SSE line
        line_end = buffer.find('\n')
        if line_end == -1:
          break

        line = buffer[:line_end].strip()
        buffer = buffer[line_end + 1:]

        if line.startswith('data: '):
          data = line[6:]
          if data == '[DONE]':
            break

          try:
            data_obj = json.loads(data)
            content = data_obj["choices"][0]["delta"].get("content")
            if content:
              print(content, end="", flush=True)
          except json.JSONDecodeError:
            pass
      except Exception:
        break
```

{% endtab %}

{% tab title="Typescript" %}

```typescript
const question = 'How would you build the tallest building ever?';
const response = await fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    messages: [{ role: 'user', content: question }],
    stream: true,
  }),
});

const reader = response.body?.getReader();
if (!reader) {
  throw new Error('Response body is not readable');
}

const decoder = new TextDecoder();
let buffer = '';

try {
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    // Append new chunk to buffer
    buffer += decoder.decode(value, { stream: true });

    // Process complete lines from buffer
    while (true) {
      const lineEnd = buffer.indexOf('\n');
      if (lineEnd === -1) break;

      const line = buffer.slice(0, lineEnd).trim();
      buffer = buffer.slice(lineEnd + 1);

      if (line.startsWith('data: ')) {
        const data = line.slice(6);
        if (data === '[DONE]') break;

        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices[0].delta.content;
          if (content) {
            console.log(content);
          }
        } catch (e) {
          // Ignore invalid JSON
        }
      }
    }
  }
} finally {
  reader.cancel();
}
```

{% endtab %}
{% endtabs %}

## Additional Information

For SSE (Server-Sent Events) streams, Infron AI occasionally sends comments to prevent connection timeouts. These comments look like:

```
: Infron AI PROCESSING
```

Comment payload can be safely ignored per the [SSE specs](https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation). However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you `JSON.stringify` the non-JSON payloads. We recommend the following clients:

* [eventsource-parser](https://github.com/rexxars/eventsource-parser)
* [OpenAI SDK](https://www.npmjs.com/package/openai)
* [Vercel AI SDK](https://www.npmjs.com/package/ai)

## Stream Cancellation

Streaming requests can be cancelled by aborting the connection. For supported providers, this immediately stops model processing and billing.

<details>

<summary>Provider Support</summary>

**Supported**

* OpenAI, Azure, Anthropic
* Fireworks, Mancer, Recursal
* AnyScale, Lepton, OctoAI
* Novita, DeepInfra, Together
* Cohere, Hyperbolic, Infermatic
* Avian, XAI, Cloudflare
* SFCompute, Nineteen, Liquid
* Friendli, Chutes, DeepSeek

**Not Currently Supported**

* AWS Bedrock, Groq, Modal
* Google, Google AI Studio, Minimax
* HuggingFace, Replicate, Perplexity
* Mistral, AI21, Featherless
* Lynn, Lambda, Reflection
* SambaNova, Inflection, ZeroOneAI
* AionLabs, Alibaba, Nebius
* Kluster, Targon, InferenceNet

</details>

To implement stream cancellation:

{% tabs %}
{% tab title="Python" %}

```python
import requests
from threading import Event, Thread

def stream_with_cancellation(prompt: str, cancel_event: Event):
    with requests.Session() as session:
        response = session.post(
            "https://llm.onerouter.pro/v1/chat/completions",
            headers={"Authorization": f"Bearer {{API_KEY}}"},
            json={"model": "{{MODEL}}", "messages": [{"role": "user", "content": prompt}], "stream": True},
            stream=True
        )

        try:
            for line in response.iter_lines():
                if cancel_event.is_set():
                    response.close()
                    return
                if line:
                    print(line.decode(), end="", flush=True)
        finally:
            response.close()

# Example usage:
cancel_event = Event()
stream_thread = Thread(target=lambda: stream_with_cancellation("Write a story", cancel_event))
stream_thread.start()

# To cancel the stream:
cancel_event.set()
```

{% endtab %}

{% tab title="Typescript" %}

```typescript
const controller = new AbortController();

try {
  const response = await fetch(
    'https://llm.onerouter.pro/v1/chat/completions',
    {
      method: 'POST',
      headers: {
        Authorization: `Bearer ${{{API_KEY}}}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: '{{MODEL}}',
        messages: [{ role: 'user', content: 'Write a story' }],
        stream: true,
      }),
      signal: controller.signal,
    },
  );

  // Process the stream...
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Stream cancelled');
  } else {
    throw error;
  }
}

// To cancel the stream:
controller.abort();
```

{% endtab %}
{% endtabs %}

Cancellation only works for streaming requests with supported providers. For\
non-streaming requests or unsupported providers, the model will continue\
processing and you will be billed for the complete response.
