# Overview

Infron AI's request and response schemas are very similar to the OpenAI Chat API, with a few small differences. At a high level, **Infron AI normalizes the schema across models and providers** so you only need to learn one.

## Quick start

### Using the OpenAI SDK

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="claude-3-5-sonnet@20240620",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)

print(completion.choices[0].message.content)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });

  console.log(completion.choices[0].message);
}

main();
```

{% endtab %}
{% endtabs %}

### Using the Infron AI API directly

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer <API_KEY>",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "claude-3-5-sonnet@20240620", 
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  })
)
print(response.json()["choices"][0]["message"]["content"])
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  }),
});
```

{% endtab %}

{% tab title="Curl" %}

```sh
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
  "model": "claude-3-5-sonnet@20240620",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
}Q
```

{% endtab %}
{% endtabs %}

## Requests

### Completions Request Format

Here is the request schema as a TypeScript type. This will be the body of your `POST` request to the `/v1/chat/completions` endpoint (see the [quick start](https://infronai.gitbook.io/docs/overview/quickstart) above for an example).

For a complete list of parameters, see the [Parameters](https://infronai.gitbook.io/docs/api-reference/llm-model-api/parameters).

### Headers

{% hint style="info" %}
**Streaming**

[Server-Sent Events (SSE)](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events#event_stream_format) are supported as well, to enable streaming *for all models*.&#x20;

Simply send `stream: true` in your request body. The SSE stream will occasionally contain a “comment” payload, which you should ignore (noted below).
{% endhint %}

{% hint style="info" %}
**Non-standard parameters**

If the chosen model doesn’t support a request parameter (such as `logit_bias` in non-OpenAI models, or `top_k` for OpenAI), then the parameter is ignored.&#x20;

The rest are forwarded to the underlying model API.
{% endhint %}

### Assistant Prefill

Infron AI supports asking models to complete a partial response. This can be useful for guiding models to respond in a certain way.

To use this features, simply include a message with `role: "assistant"` at the end of your `messages` array.

```typescript
fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    messages: [
      { role: 'user', content: 'What is the meaning of life?' },
      { role: 'assistant', content: "I'm not sure, but my best guess is" },
    ],
  }),
});
```

## Responses

### CompletionsResponse Format

Infron AI normalizes the schema across models and providers to comply with the [OpenAI Chat API](https://platform.openai.com/docs/api-reference/chat).

This means that `choices` is always an array, even if the model only returns one completion. Each choice will contain a `delta` property if a stream was requested and a `message` property otherwise. This makes it easier to use the same code for all models.

Here's the response schema as a TypeScript type:

```typescript
// Definitions of subtypes are below
type Response = {
  id: string;
  // Depending on whether you set "stream" to "true" and
  // whether you passed in "messages" or a "prompt", you
  // will get a different output shape
  choices: (NonStreamingChoice | StreamingChoice | NonChatChoice)[];
  created: number; // Unix timestamp
  model: string;
  object: 'chat.completion' | 'chat.completion.chunk';

  system_fingerprint?: string; // Only present if the provider supports it

  // Usage data is always returned for non-streaming.
  // When streaming, you will get one usage object at
  // the end accompanied by an empty choices array.
  usage?: ResponseUsage;
};
```

```typescript
// If the provider returns usage, we pass it down
// as-is. Otherwise, we count using the GPT-4 tokenizer.

type ResponseUsage = {
  /** Including images and tools if any */
  prompt_tokens: number;
  /** The tokens generated */
  completion_tokens: number;
  /** Sum of the above two fields */
  total_tokens: number;
};
```

```typescript
// Subtypes:
type NonChatChoice = {
  finish_reason: string | null;
  text: string;
  error?: ErrorResponse;
};

type NonStreamingChoice = {
  finish_reason: string | null;
  native_finish_reason: string | null;
  message: {
    content: string | null;
    role: string;
    tool_calls?: ToolCall[];
  };
  error?: ErrorResponse;
};

type StreamingChoice = {
  finish_reason: string | null;
  native_finish_reason: string | null;
  delta: {
    content: string | null;
    role?: string;
    tool_calls?: ToolCall[];
  };
  error?: ErrorResponse;
};

type ErrorResponse = {
  code: number; // See "Error Handling" section
  message: string;
  metadata?: Record<string, unknown>; // Contains additional error information such as provider details, the raw error message, etc.
};

type ToolCall = {
  id: string;
  type: 'function';
  function: FunctionCall;
};
```

Here's an example:

```json
{
  "id": "xxx-xxxxxxxxxxxxxx",
  "choices": [
    {
      "finish_reason": "stop", // Normalized finish_reason
      "native_finish_reason": "stop", // The raw finish_reason from the provider
      "message": {
        // will be "delta" if streaming
        "role": "assistant",
        "content": "Hello there!"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 4,
    "total_tokens": 4
  },
  "model": "gpt-3.5-turbo" // Could also be "anthropic/claude-2.1", etc, depending on the "model" that ends up being used
}
```

### Finish Reason

Infron AI normalizes each model's `finish_reason` to one of the following values: `tool_calls`, `stop`, `length`, `content_filter`, `error`.

### Querying Cost and Stats

The token counts that are returned in the completions API response are **not** counted via the model's native tokenizer. Instead it uses a normalized, model-agnostic count (accomplished via the GPT4o tokenizer). This is because some providers do not reliably return native token counts. This behavior is becoming more rare, however, and we may add native token counts to the response object in the future.

Credit usage and model pricing are based on the **native** token counts (not the 'normalized' token counts returned in the API response).

Note that token counts are also available in the `usage` field of the response body for non-streaming completions.
