Infron AI's request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, Infron AI normalizes the schema across models and providers so you only need to learn one.
Quick start
Using the OpenAI SDK
from openai import OpenAIclient =OpenAI(base_url="https://llm.onerouter.pro/v1",api_key="<API_KEY>",)completion = client.chat.completions.create(model="claude-3-5-sonnet@20240620",messages=[{"role":"user","content":"What is the meaning of life?"}])print(completion.choices[0].message.content)
importOpenAIfrom'openai';constopenai=newOpenAI({baseURL:'https://llm.onerouter.pro/v1',apiKey:'<API_KEY>',});asyncfunctionmain(){constcompletion=awaitopenai.chat.completions.create({model:'claude-3-5-sonnet@20240620',messages: [{role:'user',content:'What is the meaning of life?',}, ],});console.log(completion.choices[0].message);}main();
Using the Infron AI API directly
Requests
Completions Request Format
Here is the request schema as a TypeScript type. This will be the body of your POST request to the /v1/chat/completions endpoint (see the quick start above for an example).
For a complete list of parameters, see the Parameters.
Simply send stream: true in your request body. The SSE stream will occasionally contain a โcommentโ payload, which you should ignore (noted below).
Non-standard parameters
If the chosen model doesnโt support a request parameter (such as logit_bias in non-OpenAI models, or top_k for OpenAI), then the parameter is ignored.
The rest are forwarded to the underlying model API.
Assistant Prefill
Infron AI supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.
To use this features, simply include a message with role: "assistant" at the end of your messages array.
Responses
CompletionsResponse Format
Infron AI normalizes the schema across models and providers to comply with the OpenAI Chat API.
This means that choices is always an array, even if the model only returns one completion. Each choice will contain a delta property if a stream was requested and a message property otherwise. This makes it easier to use the same code for all models.
Here's the response schema as a TypeScript type:
Here's an example:
Finish Reason
Infron AI normalizes each model's finish_reason to one of the following values: tool_calls, stop, length, content_filter, error.
Querying Cost and Stats
The token counts that are returned in the completions API response are not counted via the model's native tokenizer. Instead it uses a normalized, model-agnostic count (accomplished via the GPT4o tokenizer). This is because some providers do not reliably return native token counts. This behavior is becoming more rare, however, and we may add native token counts to the response object in the future.
Credit usage and model pricing are based on the native token counts (not the 'normalized' token counts returned in the API response).
Note that token counts are also available in the usage field of the response body for non-streaming completions.
import requests
import json
response = requests.post(
url="https://llm.onerouter.pro/v1/chat/completions",
headers={
"Authorization": "Bearer <API_KEY>",
"Content-Type": "application/json"
},
data=json.dumps({
"model": "claude-3-5-sonnet@20240620",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
})
)
print(response.json()["choices"][0]["message"]["content"])
fetch('https://llm.onerouter.pro/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <API_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'claude-3-5-sonnet@20240620',
messages: [
{
role: 'user',
content: 'What is the meaning of life?',
},
],
}),
});
curl https://llm.onerouter.pro/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "claude-3-5-sonnet@20240620",
"messages": [
{
"role": "user",
"content": "What is the meaning of life?"
}
]
}Q
fetch('https://llm.onerouter.pro/v1/chat/completions', {
method: 'POST',
headers: {
Authorization: 'Bearer <API_KEY>',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: '{{MODEL}}',
messages: [
{ role: 'user', content: 'What is the meaning of life?' },
{ role: 'assistant', content: "I'm not sure, but my best guess is" },
],
}),
});
// Definitions of subtypes are below
type Response = {
id: string;
// Depending on whether you set "stream" to "true" and
// whether you passed in "messages" or a "prompt", you
// will get a different output shape
choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
created: number; // Unix timestamp
model: string;
object: 'chat.completion' | 'chat.completion.chunk';
system_fingerprint?: string; // Only present if the provider supports it
// Usage data is always returned for non-streaming.
// When streaming, you will get one usage object at
// the end accompanied by an empty choices array.
usage?: ResponseUsage;
};
// If the provider returns usage, we pass it down
// as-is. Otherwise, we count using the GPT-4 tokenizer.
type ResponseUsage = {
/** Including images and tools if any */
prompt_tokens: number;
/** The tokens generated */
completion_tokens: number;
/** Sum of the above two fields */
total_tokens: number;
};
// Subtypes:
type NonChatChoice = {
finish_reason: string | null;
text: string;
error?: ErrorResponse;
};
type NonStreamingChoice = {
finish_reason: string | null;
native_finish_reason: string | null;
message: {
content: string | null;
role: string;
tool_calls?: ToolCall[];
};
error?: ErrorResponse;
};
type StreamingChoice = {
finish_reason: string | null;
native_finish_reason: string | null;
delta: {
content: string | null;
role?: string;
tool_calls?: ToolCall[];
};
error?: ErrorResponse;
};
type ErrorResponse = {
code: number; // See "Error Handling" section
message: string;
metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
};
type ToolCall = {
id: string;
type: 'function';
function: FunctionCall;
};
{
"id": "xxx-xxxxxxxxxxxxxx",
"choices": [
{
"finish_reason": "stop", // Normalized finish_reason
"native_finish_reason": "stop", // The raw finish_reason from the provider
"message": {
// will be "delta" if streaming
"role": "assistant",
"content": "Hello there!"
}
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 4,
"total_tokens": 4
},
"model": "gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
}