Chat Completions

Create chat completions using various AI models available through Infron.

Endpoint

POST /chat/completions

Basic chat completion

Create a non-streaming chat completion.

Example request

import os
from openai import OpenAI
 
client = OpenAI(
    api_key='<API_KEY>',
    base_url='https://llm.onerouter.pro/v1'
)
 
completion = client.chat.completions.create(
    model='claude-3-5-sonnet@20240620',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life?'
        }
    ],
    stream=False,
)
 
print('Assistant:', completion.choices[0].message.content)
print('Tokens used:', completion.usage)

Streaming chat completion

Create a streaming chat completion that streams tokens as they are generated.

Example request

Streaming response format

Streaming responses are sent as Server-Sent Events (SSE)arrow-up-right, a web standard for real-time data streaming over HTTP. Each event contains a JSON object with the partial response data.

The response format follows the OpenAI streaming specification:

Key characteristics:

  • Each line starts with data: followed by JSON

  • Content is delivered incrementally in the delta.content field

  • The stream ends with data: [DONE]

  • Empty lines separate events

SSE Parsing Libraries:

If you're building custom SSE parsing (instead of using the OpenAI SDK), these libraries can help:

For more details about the SSE specification, see the W3C specificationarrow-up-right.

Image attachments

Send images as part of your chat completion request.

file-pngImages Inputschevron-right

PDF attachments

Send PDF documents as part of your chat completion request.

file-pdfPDF Inputschevron-right

Audio attachments

file-mp4Audio Inputschevron-right

Video attachments

file-movVideo Inputschevron-right

Parameters

The chat completions endpoint supports the following parameters:

Required parameters

  • model (string): The model to use for the completion (e.g., anthropic/claude-sonnet-4)

  • messages (array): Array of message objects with role and content fields

Optional parameters

  • stream (boolean): Whether to stream the response. Defaults to false

  • temperature (number): Controls randomness in the output. Range: 0-2

  • max_tokens (integer): Maximum number of tokens to generate

  • top_p (number): Nucleus sampling parameter. Range: 0-1

  • frequency_penalty (number): Penalty for frequent tokens. Range: -2 to 2

  • presence_penalty (number): Penalty for present tokens. Range: -2 to 2

  • stop (string or array): Stop sequences for the generation

  • tools (array): Array of tool definitions for function calling

  • tool_choice (string or object): Controls which tools are called (auto, none, or specific function)

  • response_format (object): Controls the format of the model's response

    • For OpenAI standard format: { type: "json_schema", json_schema: { name, schema, strict?, description? } }

    • For legacy format: { type: "json", schema?, name?, description? }

    • For plain text: { type: "text" }

    • See Structured outputsarrow-up-right for detailed examples

Message format

Messages support different content types:

Text messages

Multimodal messages

File messages

Last updated