# Quickstart

Get started with Infron

[Infron](https://infron.ai/) provides a **single API** that connects you to over thousands of **open-source models**, **commercial models**, and **search agents**—all powered by the Infron Open Model Protocol.&#x20;

As the world’s first ***AI Model Marketplace*** and ***Inference Provider Routing Platform***, Infron delivers cross-provider high availability, seamless developer workflows, and ultra-low-cost scalability through the Infron Routing Stack.

<table data-view="cards"><thead><tr><th></th><th data-hidden data-card-target data-type="content-ref"></th></tr></thead><tbody><tr><td><p><strong>Text Generation</strong><br></p><p>Generate and stream text with GPT-5.2, Claude Sonnet 4.6, Gemini 3 Flash, Llama 4, and 300+ more models.<br></p><p>AI SDK · OpenAI-compatible · Anthropic-compatible · OpenAI-Responses</p></td><td><a href="/pages/HYwJ9ECRWrwWZf9Iv4cb">/pages/HYwJ9ECRWrwWZf9Iv4cb</a></td></tr><tr><td><p><strong>Image Generation</strong><br></p><p>Create images from text prompts or edit existing images with Flux 2 Flex, Recraft V3, Imagen, and more.</p><p></p><p>AI SDK · OpenAI-compatible</p></td><td><a href="/pages/1bAdbFWqJy8GTNfKwF8p">/pages/1bAdbFWqJy8GTNfKwF8p</a></td></tr><tr><td><p><strong>Video Generation</strong><br></p><p>Create videos from text prompts, images, or video input with Veo 3.1, KlingAI, Wan, Grok Imagine Video, and more.</p></td><td><a href="/pages/CqjyssSCfkdus5B6pIcG">/pages/CqjyssSCfkdus5B6pIcG</a></td></tr><tr><td><p><strong>Audio Generation</strong><br></p><p>Create audio from text prompts, images, or video input with gpt-4o-mini-tts、tts-1, and more.</p></td><td><a href="/pages/cvdJ3cyZw0OlUuJzh4CZ">/pages/cvdJ3cyZw0OlUuJzh4CZ</a></td></tr><tr><td><p><strong>Search, Deepsearch &#x26; Extract</strong><br></p><p>generate search from text prompts with tavily, exa, jina, perpexity, and more.</p></td><td><a href="/pages/1fuCFLPtcMUjVcHlNnFR">/pages/1fuCFLPtcMUjVcHlNnFR</a></td></tr><tr><td><p><strong>Embedding &#x26; Reranker</strong><br></p><p>generate embedding and reranking from text prompts with gpt, qwen, and more.</p></td><td><a href="/pages/dPJ3lH8ZZVPUTMqlzPTQ">/pages/dPJ3lH8ZZVPUTMqlzPTQ</a></td></tr><tr><td><p><strong>Batch Generation</strong><br></p><p>Create batch completions from text prompts with AWS, Google, Azure, and more.</p></td><td><a href="/pages/sQF9BTt8eG2vZIQoEGf5">/pages/sQF9BTt8eG2vZIQoEGf5</a></td></tr></tbody></table>

### Quick Start Guide

Get up and running with Infron in under 5 minutes. Follow our step-by-step guide to set up your account and start calling ai models.

<details>

<summary><strong>Step 1：</strong><a href="https://infron.ai/login"><strong>Login to your account</strong></a></summary>

Log in using your email or sign in with Google. If you don’t have an account yet, you can create one at <https://infron.ai/login>

<figure><img src="/files/oJSj9OFjvVhVTLG2hInv" alt=""><figcaption></figcaption></figure>

If you need more support or would like to talk to our experts, you can [book a meeting](https://infron.ai/contact) or join our [Discord](https://discord.com/invite/9WZfxfzjb8) or join our [X](https://x.com/BagelPayment).

</details>

<details>

<summary><strong>Step 2：</strong><a href="https://infron.ai/dashboard/apiKeys"><strong>Create your api key</strong></a></summary>

API Keys are secret tokens used to authenticate your requests. They are unique to your account and should be kept confidential.

* **Navigate to the “**[**API Key**](https://infron.ai/dashboard/apiKeys)**” section.**

<figure><img src="/files/W434Tu9jji3w3YoseFSm" alt=""><figcaption></figcaption></figure>

* **Click on the "Create api key" button in the "API Keys" section.**

<figure><img src="/files/P04hpYetxt3oBRyfWtTO" alt=""><figcaption></figcaption></figure>

* **Copy your API key and keep it safe.**

After that, feel free to explore our API reference for more details. Or to jump start into our first example below.

</details>

<details>

<summary><strong>Step 3：</strong><a href="https://infron.ai/dashboard/credits"><strong>Set up biling</strong></a></summary>

Some features will only be available to you once you‘ve set up billing.

* **Navigate to the “**[**Billing**](https://infron.ai/dashboard/credits)**” section.**

<figure><img src="/files/482SdR8yOP4P74YkYxxt" alt=""><figcaption></figcaption></figure>

* Click "**Add Payment Method**".
* Enable "**Low Balance Alert**"

Get notified via email when your account balance drops below your configured threshold.

</details>

<details>

<summary><strong>Step 4：Make your first ai model call 💐</strong></summary>

Our HTTP API can be used with any programming language, but there are also client libraries for Python, JavaScript, and other languages that make it easier to use the API.

explore the [`LLM`](https://infron.ai/models?category=LLM), [`VLM`](https://infron.ai/models?category=LLM), [`Image Generation`](https://infron.ai/models?category=Text+to+Image), [`Video Generation`](https://infron.ai/models?category=Text+to+Video), [`Audio Generation`](https://infron.ai/models?category=Text+To+Speech), [`Search`](https://infron.ai/models?category=Search),  in [AI Model Marketplace](https://infron.ai/models).

</details>

<details>

<summary><strong>Step5: Receipt Reimbursement</strong></summary>

After topping up your Credits, you can navigate to [Billing Page](https://infron.ai/dashboard/credits) to download receipts for your historical orders.

<figure><img src="/files/6JKZqDANBmhbl2SzZWKD" alt=""><figcaption></figcaption></figure>

</details>

### For agent builders

#### For Codex

*<mark style="color:$info;">**Paste into your Codex agent and follow the steps**</mark>*

```bash
codex plugin marketplace add InfronAI/infron-codex-plugin
  After installation, enable the infron plugin
  Set INFRON_API_KEY in your shell
  Tell me how to use Infron
```


# Text

Text Generation Quickstart

This quickstart walks you through making your first text generation request with Infron.

### Using the Infron API directly

{% tabs %}
{% tab title="Curl" %}

```sh
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
}'
```

{% endtab %}

{% tab title="Python" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer <API_KEY>",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "deepseek/deepseek-v3.2", 
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ]
  })
)
print(response.json()["choices"][0]["message"]["content"])
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'deepseek/deepseek-v3.2',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  }),
});
```

{% endtab %}
{% endtabs %}

The API also supports [streaming](broken://pages/zaGkdltLztTFWVpP9DNN).

### **Using the OpenAI SDK**

Get started with just a few lines of code using your preferred SDK or framework.

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="deepseek/deepseek-v3.2",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)

print(completion.choices[0].message.content)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'deepseek/deepseek-v3.2',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });

  console.log(completion.choices[0].message);
}

main();
```

{% endtab %}
{% endtabs %}

### Using third-party SDKs

For information about using third-party SDKs and frameworks with Infron, please see our [frameworks documentation](/docs/frameworks-and-integrations/overview).

### Next steps

* Learn about [provider and model routing with fallbacks](/docs/routing-and-gateway/inference-provider-routing)
* Try other APIs: [OpenAI-compatible](/docs/llm-apis/openai-compatible-api/overview), [Anthropic-compatible](/docs/llm-apis/anthropic-compatible-api/overview), or [OpenResponses](/docs/llm-apis/openresponses-api/overview)


# Image

Image Generation Quickstart

This quickstart walks you through generating your first image with Infron.

### Image generation

{% tabs %}
{% tab title="cURL" %}

<pre class="language-sh"><code class="lang-sh">curl https://media.onerouter.pro/v1/images/generations \
<strong>    -H "Content-Type: application/json" \
</strong>    -H "Authorization: &#x3C;API_KEY>" \
    -d '{
    "model": "openai/gpt-image-2/text-to-image",
    "prompt": "A cute baby sea otter"
  }'
</code></pre>

{% endtab %}
{% endtabs %}

### Next steps

* Learn about [submit a media task and polling for the task status](/docs/media-apis).


# Video

Video Generation Quickstart

This quickstart walks you through generating your first video with Infron.

### Video generation

{% tabs %}
{% tab title="Curl" %}

<pre class="language-sh"><code class="lang-sh">curl https://media.onerouter.pro/v1/videos/generations \
<strong>    -H "Content-Type: application/json" \
</strong>    -H "Authorization: &#x3C;API_KEY>" \
    -d '{
    "model": "google/veo3.1/text-to-video",
    "prompt": "A cute baby sea otter",
  }'
</code></pre>

{% endtab %}
{% endtabs %}

### Next steps

* Learn about [submit a media task and polling for the task status](/docs/media-apis).


# Audio

Audio Generation Quickstart

This quickstart walks you through generating your first audio with Infron.

### Video generation

{% tabs %}
{% tab title="Curl" %}

<pre class="language-sh"><code class="lang-sh">curl https://media.onerouter.pro/v1/audios/generations \
<strong>    -H "Content-Type: application/json" \
</strong>    -H "Authorization: &#x3C;API_KEY>" \
    -d '{
    "model": "openai/gpt-4o-mini-tts/text-to-audio",
    "prompt": "A cute baby sea otter",
  }'
</code></pre>

{% endtab %}
{% endtabs %}

### Next steps

* Learn about [submit a media task and polling for the task status](/docs/media-apis).


# Search

Search, Deepsearch & Extract Quickstart

This quickstart walks you through generating your first search with Infron.

### Tavily

{% tabs %}
{% tab title="Tavily-Search-SDK" %}

```python
from tavily import TavilyClient

tavily_client = TavilyClient(
    api_key="<API_KEY>",
    api_base_url="https://search.onerouter.pro/v1/tavily"
)
response = tavily_client.search("Who is Leo Messi?")

print(response)
```

{% endtab %}
{% endtabs %}

### Jina

{% tabs %}
{% tab title="Python" %}

```python
import json
import requests

response = requests.post(
    "https://search.onerouter.pro/v1/chat/completions",
    headers={"Authorization":"text","Content-Type":"application/json"},
    data=json.dumps({
      "model": "jina-deepsearch-v1",
      "messages": [
        {
          "role": "user",
          "content": "Hi!"
        },
        {
          "role": "assistant",
          "content": "Hi, how can I help you?"
        },
        {
          "role": "user",
          "content": "what's the latest blog post from jina ai?"
        }
      ],
      "stream": True
    })
)

data = response.json()
```

{% endtab %}
{% endtabs %}

### Firecrawl

{% tabs %}
{% tab title="Python" %}

```python
import json
import requests

response = requests.post(
    "https://search.onerouter.pro/v1/firecrawl",
    headers={"Authorization":"text","Content-Type":"application/json"},
    data=json.dumps({
      "model": "firecrawl-search",
      "query": "who is Leo Messi?",
    })
)

data = response.json()
```

{% endtab %}
{% endtabs %}

### Perplexity

{% tabs %}
{% tab title="Python" %}

```python
import json
import requests

response = requests.post(
    "https://search.onerouter.pro/v1/perplexity",
    headers={"Authorization":"text","Content-Type":"application/json"},
    data=json.dumps({
      "model": "perplexity-search",
      "query": "latest AI developments 2024",
      "max_results": 10,
      "search_domain_filter": [
        "science.org",
        "pnas.org",
        "cell.com"
      ],
      "max_tokens_per_page": 1024,
      "country": "US",
      "search_recency_filter": "week",
      "search_after_date": "10/15/2025",
      "search_before_date": "10/16/2025"
    })
)

data = response.json()
```

{% endtab %}
{% endtabs %}

### Exa

{% tabs %}
{% tab title="Python" %}

```python
import json
import requests

response = requests.post(
    "https://search.onerouter.pro/v1/exa",
    headers={"Authorization":"text","Content-Type":"application/json"},
    data=json.dumps({
      "model": "exa-search",
      "query": "Latest research in LLMs",
      "additionalQueries": [
        "LLM advancements",
        "large language model progress"
      ],
      "type": "auto",
      "category": "news",
      "userLocation": "US",
      "numResults": 100,
      "excludeDomains": [
        "baidu.com"
      ],
      "startCrawlDate": "2023-01-01T00:00:00.000Z",
      "endCrawlDate": "2023-12-31T00:00:00.000Z",
      "startPublishedDate": "2023-01-01T00:00:00.000Z",
      "endPublishedDate": "2023-12-31T00:00:00.000Z",
      "includeText": [
        "large language model"
      ],
      "excludeText": [
        "course"
      ],
      "context": True,
      "moderation": False
    })
)

data = response.json()
```

{% endtab %}
{% endtabs %}

### Cloudsway

{% tabs %}
{% tab title="Python" %}

```python
import json
import requests

response = requests.post(
    "https://search.onerouter.pro/v1/cloudsway",
    headers={"Authorization":"text","Content-Type":"application/json"},
    data=json.dumps({
      "model": "cloudsway-smart-search",
      "q": "Latest research in LLMs",
      "count": 10,
      "offset": 0,
      "freshness": "Month",
      "sites": "baidu.com, google.com",
      "enableContent": True,
      "contentType": "TEXT",
      "contentTimeout": 3,
      "mainText": False
    })
)

data = response.json()
```

{% endtab %}
{% endtabs %}

### Next steps

{% content-ref url="/spaces/hxFf4d69JwgF5K3uZuiz/pages/UwADHoicsh5VqKzixbsD" %}
[Overview](/docs/search-apis)
{% endcontent-ref %}


# Embedding

Embedding & Reranker Quickstart

This quickstart walks you through generating your first embedding with Infron.

### Embedding quickstart

#### Basic Request

To generate embeddings, send a POST request to `/embeddings` with your text input and chosen model:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://llm.onerouter.pro/v1/embeddings",
  headers={
    "Authorization": "Bearer <API_KEY>",
    "Content-Type": "application/json",
  },
  json={
    "model": "qwen/qwen3-embedding-0.6b",
    "input": "The quick brown fox jumps over the lazy dog"
  }
)

data = response.json()
embedding = data["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")
print(f"Embedding: {embedding}")
```

{% endtab %}

{% tab title="TypeScript (fetch)" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'qwen/qwen3-embedding-0.6b',
    input: 'The quick brown fox jumps over the lazy dog'
  }),
});

const data = await response.json();
const embedding = data.data[0].embedding;
console.log(`Embedding dimension: ${embedding.length}`);
```

{% endtab %}

{% tab title="cURL" %}

```shellscript
curl https://llm.onerouter.pro/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "qwen/qwen3-embedding-0.6b",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
```

{% endtab %}
{% endtabs %}

#### Batch Processing

You can generate embeddings for multiple texts in a single request by passing an array of strings:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://llm.onerouter.pro/v1/embeddings",
  headers={
    "Authorization": "Bearer <API_KEY>",
    "Content-Type": "application/json",
  },
  json={
    "model": "qwen/qwen3-embedding-0.6b",
    "input": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing enables computers to understand text"
    ]
  }
)

data = response.json()
for i, item in enumerate(data["data"]):
  print(f"Embedding {i}: {len(item['embedding'])} dimensions")
```

{% endtab %}

{% tab title="TypeScript (fetch)" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'qwen/qwen3-embedding-0.6b',
    input: [
      'Machine learning is a subset of artificial intelligence',
      'Deep learning uses neural networks with multiple layers',
      'Natural language processing enables computers to understand text'
    ]
  }),
});

const data = await response.json();
data.data.forEach((item, index) => {
  console.log(`Embedding ${index}: ${item.embedding.length} dimensions`);
});
```

{% endtab %}

{% tab title="cURL" %}

```shellscript
curl https://llm.onerouter.pro/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
    "model": "qwen/qwen3-embedding-0.6b",
    "input": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing enables computers to understand text"
    ]
  }'
```

{% endtab %}
{% endtabs %}

#### Semantic Search

Here's a complete example of building a semantic search system using embeddings:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import numpy as np

API_KEY = "{{API_KEY_REF}}"

# Sample documents
documents = [
  "The cat sat on the mat",
  "Dogs are loyal companions",
  "Python is a programming language",
  "Machine learning models require training data",
  "The weather is sunny today"
]

def cosine_similarity(a, b):
  """Calculate cosine similarity between two vectors"""
  dot_product = np.dot(a, b)
  magnitude_a = np.linalg.norm(a)
  magnitude_b = np.linalg.norm(b)
  return dot_product / (magnitude_a * magnitude_b)

def semantic_search(query, documents):
  """Perform semantic search using embeddings"""
  # Generate embeddings for query and all documents
  response = requests.post(
    "https://llm.onerouter.pro/v1/embeddings",
    headers={
      "Authorization": f"Bearer {API_KEY}",
      "Content-Type": "application/json",
    },
    json={
      "model": "qwen/qwen3-embedding-0.6b",
      "input": [query] + documents
    }
  )
  
  data = response.json()
  query_embedding = np.array(data["data"][0]["embedding"])
  doc_embeddings = [np.array(item["embedding"]) for item in data["data"][1:]]
  
  # Calculate similarity scores
  results = []
  for i, doc in enumerate(documents):
    similarity = cosine_similarity(query_embedding, doc_embeddings[i])
    results.append({"document": doc, "similarity": similarity})
  
  # Sort by similarity (highest first)
  results.sort(key=lambda x: x["similarity"], reverse=True)
  
  return results

# Search for documents related to pets
results = semantic_search("pets and animals", documents)
print("Search results:")
for i, result in enumerate(results):
  print(f"{i + 1}. {result['document']} (similarity: {result['similarity']:.4f})")
```

{% endtab %}
{% endtabs %}

Expected output:

```
Search results:
1. Dogs are loyal companions (similarity: 0.8234)
2. The cat sat on the mat (similarity: 0.7891)
3. The weather is sunny today (similarity: 0.3456)
4. Machine learning models require training data (similarity: 0.2987)
5. Python is a programming language (similarity: 0.2654)
```

### Reranker quickstart

In the example below, we use the Rerank API endpoint to index the list of `documents` from most to least relevant to the query `"What is the capital of the United States?"`.

#### Example with Texts <a href="#example-with-texts" id="example-with-texts"></a>

**Request**

In this example, the documents being passed in are a list of strings:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

url = "https://llm.onerouter.pro/v1/rerank"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer {{API_KEY}}"
}
data = {
    "model": "voyage-rerank-2.5",
    "query": "What is the capital of the United States?",
    "top_n": 3,
    "documents": [
        "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
        "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
        "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
        "Capital punishment has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.",
    ]        
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json())
```

{% endtab %}

{% tab title="cURL" %}

```shellscript
curl --request POST \
  --url https://llm.onerouter.pro/v1/rerank \
  --header 'content-type: application/json' \
  --header "Authorization: bearer $API_KEY" \
  --data '{
    "model": "voyage-rerank-2.5",
    "query": "What is the capital of the United States?",
    "documents": [
      "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
      "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
      "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
      "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
      "Capital punishment has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."
    ],
    "top_n": 5
  }'
```

{% endtab %}
{% endtabs %}

**Response**

```json
{
  "id": "26c8ad0bb95011f0a5edda799fbd82e9",
  "results": [
    {
      "index": 3, // "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) ..."
      "relevance_score": 0.9990564
    },
    {
      "index": 4, // "Capital punishment has existed in the United States since before the United States was a country. As of 2017 ..."
      "relevance_score": 0.7516481
    },
    {
      "index": 1, // "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division ..."
      "relevance_score": 0.08882029
    },
    {
      "index": 0, // "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a ..."
      "relevance_score": 0.058238626
    },
    {
      "index": 2, // ""Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people ..."
      "relevance_score": 0.019946935
    }
  ]
}
```

### Next steps

{% content-ref url="/spaces/oWo5LeOZTLqTSLX7mP7F/pages/QzcYmUMBv5zGWGF5Ftn9" %}
[Embeddings API](/docs/llm-apis/embeddings-api/overview)
{% endcontent-ref %}

{% content-ref url="/spaces/oWo5LeOZTLqTSLX7mP7F/pages/w8pFsmBxI0zq19jjWr62" %}
[Rerank API](/docs/llm-apis/rerank-api/overview)
{% endcontent-ref %}


# Batch

Batch API Quickstart

This quickstart walks you through generating your first batch completions with Infron.

### Create a message batch

> Create a batch of messages for asynchronous processing. All usage is charged at 50% of the standard API prices.

A Batch is composed of a list of requests. The shape of an individual request is comprised of:

* A unique `custom_id` for identifying the Messages request
* A `params` object with the standard Messages API parameters

You can create a batch by passing this list into the `requests` parameter:

{% tabs %}
{% tab title="Python" %}
{% code overflow="wrap" %}

```python
import requests
import json

headers = {
    "Authorization": "Bearer <<API_KEY>>",
    "Content-Type": "application/json"
}

data = {
  "requests": [
    {
      "custom_id": "my-request-01",
      "params": {
        "model": "openai/gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn nestjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": None,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    },
    {
      "custom_id": "my-request-02",
      "params": {
        "model": "openai/gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn Reactjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": None,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    },
    {
      "custom_id": "my-request-03",
      "params": {
        "model": "openai/gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn Nextjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": None,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    }
  ]
}

response = requests.post("https://llm.onerouter.pro/v1/batches", headers=headers, data=json.dumps(data))

data = response.json()
print("Batch created:", json.dumps(data, indent=2, ensure_ascii=False))
```

{% endcode %}
{% endtab %}
{% endtabs %}

In this example, three separate requests are batched together for asynchronous processing. Each request has a unique `custom_id` and contains the standard parameters you'd use for a Messages API call.&#x20;

```json
{
  'batch': {
    'cancelled_at': None,
    'cancelling_at': None,
    'completed_at': None,
    'completion_window': '24h',
    'created_at': 1765972352,
    'endpoint': '',
    'error_file_id': '',
    'errors': None,
    'expired_at': None,
    'expires_at': 1766058749,
    'failed_at': None,
    'finalizing_at': None,
    'id': 'batch_a34c321b-ed4b-4e91-ae29-7f02939d8962',
    'in_progress_at': None,
    'input_file_id': 'file-142b17fbff7d4a06a88ec9205ae143c9',
    'metadata': None,
    'object': 'batch',
    'output_file_id': '',
    'request_counts': {
      'completed': 0,
      'failed': 0,
      'total': 0
    },
    'status': 'validating'
  },
  'batch_id': 'batch_a34c321b-ed4b-4e91-ae29-7f02939d8962',
  'file': {
    'bytes': 802,
    'created_at': 1765972347,
    'filename': 'batch.jsonl',
    'id': 'file-142b17fbff7d4a06a88ec9205ae143c9',
    'object': 'file',
    'purpose': 'batch',
    'status': 'processed'
  },
  'file_id': 'file-142b17fbff7d4a06a88ec9205ae143c9',
  'task_id': 2,
  'task_status': 'NOT_START'
}
```

### Get status or results of a specific message batch

> Get batch status if in progress, or stream results if completed in JSONL format.

{% tabs %}
{% tab title="Python" %}
{% code overflow="wrap" %}

```python
import requests
import json

# Insert your batch_id here
batch_id = "batch_a34c321b-ed4b-4e91-ae29-7f02939d8962"

headers = {
    "Authorization": "Bearer <<API_KEY>>",
    "Content-Type": "application/json"
}

response = requests.get("https://llm.onerouter.pro/v1/batches/{batch_id}", headers=headers)

print("Raw response:\n", response.text[:500])  

try:
    data = [json.loads(line) for line in response.text.splitlines() if line.strip()]
    print("\n✅ Parsed JSONL:")
    print(json.dumps(data, indent=2))
except json.JSONDecodeError:
    try:
        data = response.json()
        print("\n✅ Parsed JSON:")
        print(json.dumps(data, indent=2))
    except Exception as e:
        print("\n⚠️ Could not parse response:", e)
```

{% endcode %}
{% endtab %}
{% endtabs %}

### Cancel a specific batch

You can cancel a Batch that is currently processing using the cancel endpoint. Immediately after cancellation, a batch's `processing_status` will be `canceling`. Canceled batches end up with a status of `ended` and may contain partial results for requests that were processed before cancellation.

{% tabs %}
{% tab title="Python" %}

```python
import requests 
import json

batch_id = "batch_a34c321b-ed4b-4e91-ae29-7f02939d8962"
headers = { 
    "Authorization": "Bearer <<API_KEY>>", 
    "Content-Type": "application/json" 
}

response = requests.post(
    f"https://llm.onerouter.pro/v1/batches/{batch_id}/cancel", 
    headers=headers
)
if response.status_code == 200: 
    print("Batch canceled successfully:") 
    data = response.json() 
    print(json.dumps(data, indent=2, ensure_ascii=False)) 
else: 
    print(f"Failed to cancel batch ({response.status_code}):") 
    data = response.json() 
    print(json.dumps(data, indent=2, ensure_ascii=False))
```

{% endtab %}
{% endtabs %}

### Next steps

{% content-ref url="/spaces/pSFCOMEUY0HvEY4SE6P3/pages/XELCqOCQZk5pKoNFKgim" %}
[Overview](/docs/batch-apis)
{% endcontent-ref %}


# Platform Overview

Infron - The world’s first AI Model Marketplace and Inference Provider Routing Platform

*Infron’s Inference Providers Routing Platform give developers access to thousands of ai models, powered by world-class inference providers.*&#x20;

*Infron's APIs are also integrated into the OpenAI-SDKs, Claude-SDKs and Google-SDKs, making it easy to explore serverless inference of models on your favorite providers.*

*Infron helps developers source and optimize AI usage. We believe the future is multi-model and multi-provider.*

<figure><img src="/files/QDtMwHzDpzf8n6qgcMwL" alt=""><figcaption></figcaption></figure>

### Who Uses Infron?

#### Startups & Enterprises <a href="#startups-and-enterprises" id="startups-and-enterprises"></a>

* **AI Startups** building and scaling models without infrastructure overhead
* **Enterprise Teams** running production workloads with reliability requirements
* **ML Engineers** needing flexible compute for training and experimentation

#### Researchers & Academia <a href="#researchers-and-academia" id="researchers-and-academia"></a>

* **Research Groups** pushing state-of-the-art with budget constraints
* **PhD Students & Professors** accessing cutting-edge ai models for research
* **University Students** completing coursework and projects

#### Developers & Hobbyists <a href="#developers-and-hobbyists" id="developers-and-hobbyists"></a>

* **Solo Developers** prototyping and launching AI applications
* **Hobbyists** experimenting with the latest models
* **Open Source Contributors** testing community projects

### Core principles and values of Infron

* **Price and Performance**.&#x20;

Infron scouts for the best prices, the lowest latencies, and the highest throughput across dozens of providers, and lets you choose how to [prioritize](broken://pages/ZXxfwYY21Yz9PCv9X7ja) them.

* **Standardized API**.&#x20;

No need to change code when switching between models or providers. One magic api access to thousand of open source models, commercial models and search agents.

* **Real-World Insights**.&#x20;

Be the first to take advantage of new models.&#x20;

Infron will continue to add more ai models.

* **Consolidated Billing**.&#x20;

Simple and transparent billing, regardless of how many providers you use.

* **Higher Availability**.&#x20;

Fallback providers, and automatic, smart routing means your requests still work even when providers go down.

* **Higher Rate Limits**.&#x20;

Infron works directly with providers to provide better rate limits and more throughput.

### Why Chooses Infron? <a href="#why-openrouter" id="why-openrouter"></a>

When you build AI applications, it's tough to manage multiple provider APIs, comparing model performance, and dealing with varying reliability. Infron solves these challenges by offering:

* **Instant Access to Cutting-Edge Models**: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you'll find them here.
* **Zero Vendor Lock-in**: Unlike being tied to a single provider's model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.
* **Production-Ready Performance**: Built for enterprise workloads with the reliability your applications demand.

Here's what you can build:

* **Text Generation**: Use Large language models with tool-calling capabilities for chatbots, content generation, and code assistance
* **Image and Video Generation**: Create custom images and videos, including support for LoRAs and style customization
* **Search & Retrieval**: State-of-the-art embeddings for semantic search, RAG systems, and recommendation engines

### Key Features

* **🎯 All-in-One API**: A single API for text generation, image generation, document embeddings, search, deep seach, summarization, image classification, and more.
* **🔀 Multi-Provider Support**: Easily run models from top-tier providers like Cerebrase, Replicate, Sambanova, Together AI, and others.
* **🚀 Scalable & Reliable**: Built for high availability and low-latency performance in production environments.
* **🔧 Developer-Friendly**: Simple requests, fast responses, and a consistent developer experience across OpenAI, Claude clients.
* **👷 Easy to integrate**: Drop-in replacement for the OpenAI chat completions API.
* **💰 Cost-Effective**: No extra markup on provider rates.


# FAQ

Common questions about Infron AI.

## Getting started <a href="#getting-started" id="getting-started"></a>

<details>

<summary><strong>Why should I use Infron?</strong></summary>

Infron provides a unified API to access all the major LLM models on the market. It also allows users to aggregate their billing in one place and keep track of all of their usage using our analytics.

Infron passes through the pricing of the underlying providers, while pooling their uptime, so you get the same pricing you’d get from the provider directly, with a unified API and fallbacks so that you get much better uptime.

</details>

<details>

<summary><strong>What makes Infron unique?</strong></summary>

Infron AI stands out as a unified routing layer that connects multiple AI model providers through a single, consistent API. Instead of integrating separately with different LLM or embedding services, developers can use Infron to simplify model management, request routing, and version control.&#x20;

Infron offers flexible configuration options—such as automatic provider selection, fallback routing, and performance optimization—which help ensure reliability and cost-efficiency. In short, Infron AI makes it easier to build and scale AI applications by abstracting away provider complexity while maintaining full transparency and control.

</details>

<details>

<summary><strong>What's the story behind Infron?</strong></summary>

Infron was created to solve a growing pain in the AI development world: managing multiple model providers efficiently. As the ecosystem of large language models and embeddings expanded, developers often found themselves juggling different APIs, authentication methods, and data formats for each provider. This added unnecessary friction and slowed down innovation.

Seeing this challenge, the creators of Infron envisioned a single, unified routing layer that could abstract away these complexities—allowing developers to focus on what matters most: building great products powered by AI. The idea was to give teams the flexibility to mix and match providers, experiment seamlessly, and improve reliability through smart routing and fallbacks.

From that vision, Infron emerged as an infrastructure solution designed to make multi‑provider AI development as simple, scalable, and transparent as possible. It reflects the broader effort to move from fragmented model integrations toward a cohesive, provider‑agnostic AI ecosystem.

</details>

<details>

<summary><strong>Why should a person choose Infron over its competitors?</strong></summary>

Infron offers a flexible and developer‑friendly way to manage multiple AI model providers through one unified API. Unlike tools that tie you to a single vendor, Infron AI lets you easily switch or combine models from different sources without changing your application code.&#x20;

Infron provides built‑in routing logic, fallback mechanisms, and usage tracking so you can optimize cost, latency, and reliability automatically. In addition, its configuration‑based approach and detailed observability tools simplify scaling and debugging. In short, Infron AI helps teams focus on building AI‑powered features rather than maintaining complex provider integrations.

</details>

<details>

<summary><strong>Who are the primary audience of Infron?</strong></summary>

The primary audience of Infron includes developers, product teams, and organizations building applications that rely on AI models or large language models (LLMs).&#x20;

Infron is designed for engineers who need to integrate, manage, and optimize access to multiple AI providers without maintaining separate APIs. Startups, enterprise AI teams, and platform builders can all benefit from its unified routing system—especially those seeking flexibility, scalability, and cost control in multi‑provider environments. In essence, Infron AI serves anyone who wants to simplify AI infrastructure while maintaining high performance and reliability.

</details>

<details>

<summary><strong>How do I get started with Infron AI?</strong></summary>

To get started, create an account and add credits on the [Credits page](https://infron.ai/dashboard/credits). Credits are simply deposits on Infron that you use for LLM inference. When you use the API or chat interface, we deduct the request cost from your credits. Each model and provider has a different price per million tokens.

Once you have credits you can create API keys and start using the API. You can read our [quickstart](/docs) guide for code samples and more.

</details>

<details>

<summary><strong>How do I get support?</strong></summary>

[Support](/docs/support/privacy-logging)

</details>

<details>

<summary><strong>How do I get billed for my usage on Infron?</strong></summary>

For each model we have the pricing displayed per million tokens. There is usually a different price for prompt and completion tokens. There are also models that charge per request, for images and for reasoning tokens. All of these details will be visible on the [Logs page](https://infron.ai/dashboard/logs).

When you make a request to Infron, we receive the total number of tokens processed by the provider. We then calculate the corresponding cost and deduct it from your credits. You can review your complete usage history in the [Activities page](https://infron.ai/dashboard/activity).

You can also add the `usage: {include: true}` parameter to your chat request to [get the billing information in the response](/docs/observability/billing-tracking).

We offer different discounts up to 35% based on the pricing of underlying providers.

</details>

## Pricing <a href="#pricing-and-fees" id="pricing-and-fees"></a>

<details>

<summary><strong>What are the prices for using Infron?</strong></summary>

Infron charges a [$0.35 + 5% fee](/docs/overview/pricing-and-fee-structure) when you purchase credits. We pass through the pricing of the underlying model providers without any markup, so you pay the same rate as you would directly with the provider.

For more details on our model price, please see our [Models page](https://infron.ai/models).

For more details about every request cost, please see our [Logs page](https://infron.ai/dashboard/logs).

</details>

<details>

<summary><strong>How is the billing calculated when Prompt Cache is enabled?</strong></summary>

Regardless of whether the cached result is used or a new prompt is processed, billing will follow the Prompt Cache rate as defined in our pricing documentation. This applies to every request, since Prompt Cache is always active in Infron.

<figure><img src="/files/KlxXCjjO9y21yjD1kXBz" alt=""><figcaption></figcaption></figure>

</details>

## Models and Providers <a href="#models-and-providers" id="models-and-providers"></a>

<details>

<summary><strong>What LLM models does Infron support?</strong></summary>

Infron AI provides access to a wide variety of LLM models, including frontier models from major AI labs.&#x20;

For a complete list of models you can visit the [Models page](https://infron.ai/models) or fetch the list through the [models api](broken://pages/26321336de5752d195b610e2f1b8368d30221605).

</details>

<details>

<summary><strong>How frequently are new models added?</strong></summary>

We work on adding models as quickly as we can. We often have partnerships with the labs releasing models and can release models as soon as they are available.

</details>

<details>

<summary><strong>I am an inference provider, how can I get listed on Infron?</strong></summary>

If you would like to contact us, the best place to reach us is over email.

[Support](/docs/support/privacy-logging)

</details>

<details>

<summary><strong>How does model fallback work if a provider is unavailable?</strong></summary>

If a provider returns an error Infron AI will automatically fall back to the next provider. This happens transparently to the user and allows production apps to be much more resilient.

</details>

## API Technical Specifications <a href="#api-technical-specifications" id="api-technical-specifications"></a>

<details>

<summary><strong>What authentication methods are supported?</strong></summary>

Infron AI uses three authentication methods:

1. API keys (passed as Bearer tokens) for accessing the completions API and other core endpoints

</details>

<details>

<summary><strong>What API endpoints are available?</strong></summary>

Infron implements the OpenAI API specification for /completions and /chat/completions endpoints, allowing you to use any model with the same request/response format.&#x20;

</details>

<details>

<summary><strong>Which are the primary technologies used for building Infron AI?</strong></summary>

Infron AI is typically built using modern, cloud‑native web technologies optimized for performance, scalability, and integration with AI services. At its core, Infron AI relies on:

1. **TypeScript and Node.js** – for the main API logic, routing, and configuration management. These enable a robust developer experience and compatibility with diverse model providers.
2. **Cloud infrastructure (e.g., AWS, GCP, or similar)** – to support distributed routing, load balancing, and secure service deployment across regions.
3. **Database and caching systems** – often using PostgreSQL or similar for persistent data, and Redis or in‑memory stores for high‑speed routing decisions.
4. **API and network layer technologies** – including REST and WebSocket interfaces, authentication systems, and observability tooling to track provider usage and latency.
5. **Integration SDKs and AI provider APIs** – connectors built for leading LLM and AI platforms (such as OpenAI, Anthropic, Google, etc.) to enable seamless model switching.

Together, these technologies provide a flexible foundation that allows Infron AI to route, monitor, and optimize traffic across multiple AI services effectively.

</details>

<details>

<summary><strong>What are the supported formats?</strong></summary>

The API supports text and images. [Images](/docs/features/multimodal-input/images-inputs) can be passed as URLs or base64 encoded images. PDF and other file types are coming soon.

</details>

<details>

<summary><strong>How does streaming work?</strong></summary>

Streaming uses server-sent events (SSE) for real-time token delivery.&#x20;

Set `stream: true` in your request to enable streaming responses.

</details>

<details>

<summary><strong>What SDK support is available?</strong></summary>

Infron AI is a drop-in replacement for OpenAI. Therefore, any SDKs that support OpenAI by default also support Infron AI. Check out our [docs](/docs/frameworks-and-integrations/overview) for more details.

</details>

<details>

<summary><strong>Can I mix different modalities in one request?</strong></summary>

Yes! You can send text, images, PDFs, and audio in the same request. The model will process all inputs together.

</details>

<details>

<summary><strong>Does Infron use Prompt Cache by default?</strong></summary>

Yes. Prompt Cache is enabled by default in Infron AI for all API calls. This means that whenever you send a request, Infron AI will attempt to use the cached prompt/response if applicable.

</details>

<details>

<summary><strong>Will using Prompt Cache change my token usage or latency?</strong></summary>

* **Token usage:** When a cached response is served, actual model inference may be skipped, which can reduce token consumption.
* **Latency:** Cached responses are generally faster to return compared to generating new responses from the model.
* **Billing:** The cost per request is based on the Prompt Cache price tier, regardless of cache hit or miss

</details>

<details>

<summary><strong>Can I disable Prompt Cache?</strong></summary>

At this time, Prompt Cache is permanently enabled in Infron AI and cannot be turned off. The design ensures consistent performance optimization and uniform billing.

</details>

## Privacy and Data Logging <a href="#privacy-and-data-logging" id="privacy-and-data-logging"></a>

Please see our [Terms of Service](https://infron.ai/terms-of-use) and [Privacy Policy](https://infron.ai/privacy-policy).

<details>

<summary><strong>What data is logged during API use?</strong></summary>

We log basic request metadata (timestamps, model used, token counts). Prompt and completion are not logged by default. We do zero logging of your prompts/completions, even if an error occurs.

</details>

## Credit and Billing Systems <a href="#credit-and-billing-systems" id="credit-and-billing-systems"></a>

<details>

<summary><strong>What purchase options exist?</strong></summary>

Infron AI uses a credit system where the base currency is US dollars.&#x20;

All of the pricing on our site and API is denoted in dollars. Users can top up their balance manually.

</details>

<details>

<summary><strong>My credits haven't showed up in my account</strong></summary>

If you paid using Stripe, sometimes there is an issue with the Stripe integration and credits can get delayed in showing up on your account. Please allow up to one hour. If your credits still have not appeared after an hour, contact us on email and we will look into it.

</details>

<details>

<summary><strong>How to monitor credit usage?</strong></summary>

The [Activity page](https://infron.ai/dashboard/activity) allows users to view their historic usage and filter the usage by model, provider and api key.

We also provide a [Logs page](https://infron.ai/dashboard/logs) that has live information about the balance and remaining credits for the account.

</details>

<details>

<summary><strong>How do volume discounts work?</strong></summary>

Infron AI does not currently offer volume discounts, but you can reach out to us over email if you think you have an exceptional use case.

</details>

<details>

<summary><strong>What payment methods are accepted?</strong></summary>

We accept all major credit cards, AliPay, PayPal, WechatPay and Invoice.

</details>

## Account Management <a href="#account-management" id="account-management"></a>

<details>

<summary><strong>What analytics are available?</strong></summary>

Our [Activity dashboard](https://infron.ai/dashboard/activity) provides real-time usage metrics. If you would like any specific reports or metrics please contact us.

</details>

## Input Format Support <a href="#input-format-support" id="input-format-support"></a>

<details>

<summary>What about video support?</summary>

Video modality support is coming soon! We’re working on adding video processing capabilities to expand our multimodal offerings.

</details>


# Pricing and Fee Structure

Understanding Infron’s usage-based pricing model.

### Pricing Model

* **Usage-based pricing** — You pay only for what you use
* **Per-model pricing** — Each model has its own price
* **No subscription required** — Top up credits and use as needed

### Pricing Plans

|                                                                                      | Pay-as-you-go                                                                                                                                                                                                                                                                                                            | Enterprise                                                                                                                                                                                                                                                                            |
| ------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Platform Fees                                                                        | `%5 + $0.35` / per transaction                                                                                                                                                                                                                                                                                           | `3%` / per transaction                                                                                                                                                                                                                                                                |
| Model Prices                                                                         | <p><code>0%</code> markup</p><p></p><p>Infron AI pass through the pricing of the underlying model providers without any markup, so you pay the same rate as you would directly with the provider.</p><p></p><p>The model price details could be get in the <a href="https://infron.ai/models">model marketplace</a>.</p> | <p>Up to <code>30%</code> off</p><p></p><p>For our enterprise customers, we offer tiered discounts based on your actual monthly usage – with savings of up to <code>30%</code> off regular pricing!</p><p></p><p><a href="https://infron.ai/contact">Bulk discounts available</a></p> |
| <p>Models</p><p><a href="https://infron.ai/models">Explore all models →</a></p>      | <p>400+ models & agents</p><ul><li>LLMs</li><li>Media Models</li><li>Embeddings</li><li>Reranker</li><li>Search & Deepsearch & Research</li><li>Batch API</li></ul>                                                                                                                                                      | <p>400+ models & agents</p><ul><li>LLMs</li><li>Media Models</li><li>Embeddings</li><li>Reranker</li><li>Search & Deepsearch & Research</li><li>Batch API</li></ul>                                                                                                                   |
| Chat and API Access                                                                  | ✅                                                                                                                                                                                                                                                                                                                        | ✅                                                                                                                                                                                                                                                                                     |
| Activity Logs & Export                                                               | ✅                                                                                                                                                                                                                                                                                                                        | ✅                                                                                                                                                                                                                                                                                     |
| <p>Provider Routing</p><p><a href="/pages/u0Iopmf5CGGFEpLtjuv7">Learn more →</a></p> | ✅                                                                                                                                                                                                                                                                                                                        | ✅                                                                                                                                                                                                                                                                                     |
| <p>Model Fallbacks</p><p><a href="/pages/R5PIsfZ2g6r7ALlVi8Lv">Learn more →</a></p>  | ✅                                                                                                                                                                                                                                                                                                                        | ✅                                                                                                                                                                                                                                                                                     |
| Budgets & Spend Controls                                                             | ✅                                                                                                                                                                                                                                                                                                                        | ✅                                                                                                                                                                                                                                                                                     |
| <p>Prompt Caching</p><p><a href="/pages/vgnIccAmDw8Hw0R3rrvO">Learn more →</a></p>   | ✅                                                                                                                                                                                                                                                                                                                        | ✅                                                                                                                                                                                                                                                                                     |
| Contractual SLAs                                                                     | Shared public resource pool                                                                                                                                                                                                                                                                                              | Provisioned Throughput                                                                                                                                                                                                                                                                |
| Payment options                                                                      | <ul><li>Credit card, Alipay, Bank, crypto & more</li></ul>                                                                                                                                                                                                                                                               | <ul><li>Credit card, Alipay, Bank, crypto & more</li><li>Invoicing options</li></ul>                                                                                                                                                                                                  |
| <p>BYOK Limits</p><p><a href="/pages/YrlOtdZA8cyZueuTNgvi">Learn more →</a></p>      | `0%` fee                                                                                                                                                                                                                                                                                                                 | `0%` fee                                                                                                                                                                                                                                                                              |
| Rate limits                                                                          | Default global limits                                                                                                                                                                                                                                                                                                    | Optional dedicated limits                                                                                                                                                                                                                                                             |
| Support                                                                              | Email Support                                                                                                                                                                                                                                                                                                            | Support SLA with dedicated Slack Channel or Whatsapp Group                                                                                                                                                                                                                            |
| Refund FeesC                                                                         | Non-refundable                                                                                                                                                                                                                                                                                                           | For business users, unused account balance is eligible for a `full refund` with `no processing fees`.                                                                                                                                                                                 |
| Credits                                                                              | Credits never expire                                                                                                                                                                                                                                                                                                     | Credits never expire                                                                                                                                                                                                                                                                  |

## **Important Notes** <a href="#important-notes" id="important-notes"></a>

1. If you have **special requirements for pricing and models** reach out to us at <support@infron.ai>.
2. When you add funds to your Infron account, we take a small fee, known as the `“Services Fees”`, to cover the costs of credit card transaction fees, currency conversion fees, taxes etc. The net funds will be topped up your balance account.

Eg. For a customer buying a `$50` credits find the detailed pricing breakdown:

| Description                                               | Amount |
| --------------------------------------------------------- | ------ |
| **Credits Price**                                         | $50    |
| **Platform Fees** (5%+$0.35)                              | $2.85  |
| **Total**                                                 | $52.85 |
| **Net Funds** (will be topped up to your balance account) | $50    |


# Inference Provider Routing

Route requests to the best inference provider

Infron AI routes requests to the best available providers for your model.&#x20;

<figure><img src="/files/4Pi8M0J3bGFIF1RbHbyc" alt=""><figcaption></figcaption></figure>

**By default, requests are load-balanced across the top providers to maximize uptime and best** **price**.

You can customize how your requests are routed using the `provider` object in the request body for Chat Completions and Completions.

The `provider` object can contain the following fields:

<table><thead><tr><th width="248.0242919921875">Field</th><th width="166">Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><a href="#ordering-specific-providers-order"><code>order</code></a></td><td>string[]</td><td>-</td><td>List of provider slugs to try in order (e.g. <code>["anthropic", "openai"]</code>). </td></tr><tr><td><a href="#disabling-fallbacks-allow_fallbacks"><code>allow_fallbacks</code></a></td><td>boolean</td><td><code>true</code></td><td>Whether to allow backup providers when the primary is unavailable.</td></tr><tr><td><a href="#provider-sorting-sort"><code>sort</code></a></td><td>string | object</td><td>-</td><td>Sort providers by price, throughput, or latency. (e.g. <code>"price"</code>)</td></tr><tr><td><a href="#performance-thresholds-preferred_min_throughput-preferred_max_latency"><code>preferred_min_throughput</code></a></td><td>number | object</td><td>-</td><td>Preferred minimum throughput (tokens/sec). Can be a number or an object with percentile cutoffs (p50, p75, p90, p99).</td></tr><tr><td><a href="#performance-thresholds-preferred_min_throughput-preferred_max_latency"><code>preferred_max_latency</code></a></td><td>number | object</td><td>-</td><td>Preferred maximum latency (seconds). Can be a number or an object with percentile cutoffs (p50, p75, p90, p99). </td></tr><tr><td><a href="#requiring-providers-to-support-all-parameters-require_parameters"><code>require_parameters</code></a></td><td>boolean</td><td><code>true</code></td><td>Only use providers that support all parameters in your request. </td></tr><tr><td><a href="#requiring-providers-to-comply-with-data-policies-data_collection"><code>data_collection</code></a></td><td>"allow" | "deny"</td><td>"allow"</td><td>Control whether to use providers that may store data.</td></tr><tr><td><a href="#zero-data-retention-enforcement-zdr"><code>zdr</code></a></td><td>boolean</td><td><code>false</code></td><td>Restrict routing to only ZDR (Zero Data Retention) endpoints. </td></tr><tr><td><a href="#distillable-text-enforcement-enforce_distillable_text"><code>enforce_distillable_text</code></a></td><td>boolean</td><td><code>false</code></td><td>Restrict routing to only models that allow text distillation.</td></tr><tr><td><a href="#allowing-only-specific-providers-only"><code>only</code></a></td><td>string[]</td><td>-</td><td>List of provider slugs to allow for this request. </td></tr><tr><td><a href="#ignoring-providers-ignore"><code>ignore</code></a></td><td>string[]</td><td>-</td><td>List of provider slugs to skip for this request. </td></tr><tr><td><a href="#quantization-quantizations"><code>quantizations</code></a></td><td>string[]</td><td>-</td><td>List of quantization levels to filter by (e.g. <code>["int4", "int8"]</code>).</td></tr></tbody></table>

### **Cost-effective** Load Balancing (Default Strategy)

For each model in your request, Infron's default behavior is to load balance requests across providers, **balancing the best throughput, lowest latency, and lowest price**.

<figure><img src="/files/XZlQndqZD8xnQQFc5a49" alt=""><figcaption></figcaption></figure>

When you send a model request, **Infron automatically evaluates multiple providers in real time**. It considers factors such as **latency**, **throughput**, **reliability**, and **price**—based on the default weight distribution shown above.&#x20;

{% hint style="info" %}
For instance, if Provider A offers slightly higher throughput but at a higher cost, while Provider B is more affordable with moderate latency, Infron will intelligently balance requests across both to achieve the best overall performance and cost efficiency.
{% endhint %}

<figure><img src="/files/fw5iJPKFSYf0wcHYuITL" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
If you are more sensitive to throughput than price, you can use the [sort](#provider-sorting-sort) field to explicitly prioritize throughput.

If you have `sort` or `order` set in your provider preferences, load balancing default strategy will be disabled.
{% endhint %}

### Ordering Specific Providers (order)

You can set the providers that Infron AI will prioritize for your request using the `order` field.

| Field   | Type      | Default | Description                                                              |
| ------- | --------- | ------- | ------------------------------------------------------------------------ |
| `order` | string\[] | -       | List of provider slugs to try in order (e.g. `["anthropic", "openai"]`). |

Infron AI will prioritize providers in this order, for the model you're using. If you don't set this field, the router will use the [default strategy](#cost-effective-load-balancing-default-strategy).

You can use the copy button next to provider names on model pages to get the exact provider slug, for example like "`anthropic`"、"`openai`"、“`novita`”

<figure><img src="/files/cBstjH9iWTWKgvTHACT9" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
**Order example (enable allow\_fallbacks by default)**:

* `azure` is hosting the "`anthropic/claude-sonnet-4.5`"
* `anthropic` is hosting the "`anthropic/claude-sonnet-4.5`"
* `openai` is hosting the "`anthropic/claude-sonnet-4.5`"

You set the `order` filed as `["anthropic", "openai"]`, and you're calling the "`anthropic/claude-sonnet-4.5`" model.

* If Provider `anthropic` fails, then Provider `openai` will be tried next.
* If Provider `openai` also fails, then `backup provider` (may be azure) will be tried last.
  {% endhint %}

Infron will try all the providers which are specified in `order` one at a time, and proceed to other `backup providers` if none are operational.&#x20;

If you don't want to allow any other providers, you should disable `allow_fallbacks` as well.

{% hint style="info" %}
**Order example (disable allow\_fallbacks by default)**:

* `azure` is hosting the "`anthropic/claude-sonnet-4.5`"
* `anthropic` is hosting the "`anthropic/claude-sonnet-4.5`"
* `openai` is hosting the "`anthropic/claude-sonnet-4.5`"

You set the `order` filed as `["anthropic", "openai"]`, and you're calling the "`anthropic/claude-sonnet-4.5`" model.&#x20;

You set the `allow_fallbacks` as `false`.

* If Provider `anthropic` fails, then Provider `openai` will be tried next.
* If Provider `openai` also fails, then this request will `fails` finally.
  {% endhint %}

#### Example: Specifying providers with fallbacks

In the example below, your request will first be sent to Google AI Studio, and only when Google AI Studio experiences a serious outage will the request be forwarded to Google Vertex.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['novita', 'deepinfra'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "order": ["novita", "deepinfra"]
  }
}'
```

{% endtab %}
{% endtabs %}

#### Example: Specifying providers with fallbacks disabled

Here's an example with `allow_fallbacks` set to `false`,your request will first be sent to Google AI Studio, and then fails if Google AI Studio fails

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'google/gemini-3-flash-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['google-ai-studio'],
    'allow_fallbacks': False
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "google/gemini-3-flash-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "order": ["google-ai-studio"],
      "allow_fallbacks": false
  }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/mFgmhnxbj0ngrPTseb2a" alt=""><figcaption></figcaption></figure>

#### Example: Targeting Specific Provider Endpoints

Each provider on Infron may host multiple endpoints for the same model, such as a default endpoint and a specialized "quantizations" endpoint. To target a specific endpoint, you can use the copy button next to the provider name on the model detail page to obtain the exact provider slug.

For example, MiniMax offers MiniMax M2.1 through multiple endpoints:

* Default endpoint with slug `minimax/fp8`
* Lightning endpoint with slug `minimax/lightning`

By copying the exact provider slug and using it in your request's `order` array, you can ensure your request is routed to the specific endpoint you want:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'minimax/minimax-m2.1',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['minimax/fp8'],
    'allow_fallbacks': False,
  },
})
```

{% endtab %}
{% endtabs %}

This approach is especially useful when you want to consistently use a specific variant of a model from a particular provider.

### Provider Sorting (sort)

If you instead want to *explicitly* prioritize a particular provider attribute, you can include the `sort` field in the `provider` preferences. Default strategy will be disabled, and the router will try providers in order.

The three sort options are:

* `"price"`: prioritize lowest price
* `"throughput"`: prioritize highest throughput
* `"latency"`: prioritize lowest latency

{% tabs %}
{% tab title="prioritize lowest price" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
  },
})
```

{% endtab %}

{% tab title="prioritize highest throughput" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'throughput',
  },
})
```

{% endtab %}

{% tab title="prioritize lowest latency" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'latency',
  },
})
```

{% endtab %}
{% endtabs %}

* To *always* prioritize low prices, set `sort` to `"price"`.

<figure><img src="/files/GwUk4dy6V4GjVs7EWqIE" alt=""><figcaption></figcaption></figure>

* To *always* prioritize highest throughput, set `sort` to `"throughput"`.

<figure><img src="/files/QQKIqFPKFmQ4mhRKpNje" alt=""><figcaption></figcaption></figure>

* To *always* prioritize low latency, set `sort` to `"latency"`.

<figure><img src="/files/1tu55f0RyMKreuPsGPDO" alt=""><figcaption></figcaption></figure>

### Performance Thresholds (preferred\_min\_throughput / preferred\_max\_latency)

You can set `minimum throughput` or `maximum latency` thresholds to filter endpoints.&#x20;

Endpoints that don't meet these thresholds are deprioritized (moved to the end of the list) rather than excluded entirely.

<table><thead><tr><th width="238.4339599609375">Field</th><th width="165.926513671875">Type</th><th width="78.209716796875">Default</th><th>Description</th></tr></thead><tbody><tr><td><code>preferred_min_throughput</code></td><td>number | object</td><td>-</td><td><p>Preferred minimum throughput in tokens per second. </p><p>Can be </p><ul><li><code>a number (applies to p50)</code></li><li>or an <code>object</code> with <code>percentile cutoffs</code>.</li></ul></td></tr><tr><td><code>preferred_max_latency</code></td><td>number | object</td><td>-</td><td><p>Preferred maximum latency in seconds.</p><p>Can be </p><ul><li><code>a number (applies to p50)</code></li><li>or an <code>object</code> with <code>percentile cutoffs</code>.</li></ul></td></tr></tbody></table>

#### How Percentiles Work

Infron tracks `latency` and `throughput` metrics for each model and provider using `percentile statistics` calculated over a rolling `5-minute window`. The available percentiles are:

* **p50** (median): 50% of requests perform better than this value
* **p75**: 75% of requests perform better than this value
* **p90**: 90% of requests perform better than this value
* **p99**: 99% of requests perform better than this value

Higher percentiles (like p90 or p99) give you more confidence about worst-case performance, while lower percentiles (like p50) reflect typical performance. **For example, if a model and provider has a p90 latency of 2 seconds, that means 90% of requests complete in under 2 seconds**.

#### When to Use Percentile Preferences

Percentile-based routing is useful when you need predictable performance characteristics:

* **Real-time applications**: Use p90 or p99 latency thresholds to ensure consistent response times for user-facing features
* **Batch processing**: Use p50 throughput thresholds when you care more about average performance than worst-case scenarios
* **SLA compliance**: Use multiple percentile cutoffs to ensure providers meet your service level agreements across different performance tiers
* **Cost optimization**: Combine with `sort: "price"` to get the cheapest provider that still meets your performance requirements

#### Example: Find the Cheapest Model Meeting Performance Requirements

Combine `'sort': 'price'` with `performance thresholds` to find the cheapest option that meets your performance requirements. **This is useful when you have a performance floor but want to minimize costs**.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'models': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    },
    'preferred_min_throughput': {
      'p90': 50, # Prefer providers with >50 tokens/sec for 90% of requests in last 5 minutes
    },
  },
})
```

{% endtab %}

{% tab title="Curl" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "models": "deepseek/deepseek-v3.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "sort": "price",
      "preferred_min_throughput": {
        "p90": 50
      }
    }
  }'
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/fSje3mOZfbKkNSOKaFvY" alt=""><figcaption></figcaption></figure>

In this example, Infron will find the `cheapest provider` that `has at least 50 tokens/second throughput at the p90 level` (meaning 90% of requests achieve this throughput or better). Providers below this threshold are still available as fallbacks if all preferred options fail.

You can also use `preferred_max_latency` to set a `maximum acceptable latency`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'models': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    'preferred_max_latency': {
      'p90': 10, # Prefer providers with <10 second latency for 90% of requests in last 5 minutes
    },
  },
})
```

{% endtab %}

{% tab title="Curl" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "models": "deepseek/deepseek-v3.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "sort": "price",
      "preferred_max_latency": {
        "p90": 3
      }
    }
  }'
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/Fjq6yoGLGFOO3d3RMhUM" alt=""><figcaption></figcaption></figure>

#### Example: Using Multiple Percentile Cutoffs

You can specify multiple percentile cutoffs to set both typical and worst-case performance requirements. All specified cutoffs must be met for a provider to be in the preferred group.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'preferred_max_latency': {
      'p50': 1, # Prefer providers with <1 second latency for 50% of requests in last 5 minutes
      'p90': 3, # Prefer providers with <3 second latency for 90% of requests in last 5 minutes
      'p99': 5, # Prefer providers with <5 second latency for 99% of requests in last 5 minutes
    },
    'preferred_min_throughput': {
      'p50': 100, # Prefer providers with >100 tokens/sec for 50% of requests in last 5 minutes
      'p90': 50, # Prefer providers with >50 tokens/sec for 90% of requests in last 5 minutes
    },
  },
})
```

{% endtab %}

{% tab title="Curl" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "preferred_max_latency": {
        "p50": 1,
        "p90": 3,
        "p99": 5
      },
      "preferred_min_throughput": {
        "p50": 100,
        "p90": 50
      }
    }
  }'
```

{% endtab %}
{% endtabs %}

### Requiring Providers to Support All Parameters (require\_parameters)

You can restrict requests only to providers that support all parameters in your request using the `require_parameters` field.

When you send a request with \``tools`\` or \``tool_choice`\`, Infron will only route to providers that support tool use. Similarly, if you set a \``max_tokens`\`, then Infron will only route to providers that support a response of that length.

| Field                | Type    | Default | Description                                                     |
| -------------------- | ------- | ------- | --------------------------------------------------------------- |
| `require_parameters` | boolean | `true`  | Only use providers that support all parameters in your request. |

* With the default routing strategy (set `require_parameters` to `true`), providers that don't support all the LLM parameters specified in your request can still receive the request, but will ignore unknown parameters.&#x20;
* When you set `require_parameters` to `false`, the request won't even be routed to that provider.

#### Example: Excluding providers that don't support JSON formatting

For example, to only use providers that support JSON formatting:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'require_parameters': True,
  },
  'response_format': { 'type': 'json_object' },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "require_parameters": true,
      "response_format": { "type": "json_object" }  
    }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/PEZXrnjuR8zslhA6gzYc" alt=""><figcaption></figcaption></figure>

### Requiring Providers to Comply with Data Policies (data\_collection)

You can restrict requests only to providers that comply with your data policies using the `data_collection` field.

| Field             | Type              | Default | Description                                           |
| ----------------- | ----------------- | ------- | ----------------------------------------------------- |
| `data_collection` | "allow" \| "deny" | "allow" | Control whether to use providers that may store data. |

* `allow`: (default) allow providers which store user data non-transiently and may train on it
* `deny`: use only providers which do not collect user data

Some model providers may log prompts, so we display them with a **Data Policy** tag on model pages. This is not a definitive source of third party data policies, but represents our best knowledge.

#### Example: Excluding providers that don't comply with data policies

To exclude providers that don't comply with your data policies, set `data_collection` to `deny`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2'
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'data_collection': 'deny', # or "allow"
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "data_collection": "deny" 
    }
}'
```

{% endtab %}
{% endtabs %}

### Zero Data Retention Enforcement (zdr)

You can enforce Zero Data Retention (ZDR) on a per-request basis using the `zdr` parameter, ensuring your request only routes to endpoints that do not retain prompts.

| Field | Type    | Default | Description                                                   |
| ----- | ------- | ------- | ------------------------------------------------------------- |
| `zdr` | boolean | `false` | Restrict routing to only ZDR (Zero Data Retention) endpoints. |

* When `zdr` is set to `true`, the request will only be routed to endpoints that have a Zero Data Retention policy.&#x20;
* When `zdr` is `false` or not provided, it has no effect on routing.

#### Example: Enforcing ZDR for a specific request

To ensure a request only uses ZDR endpoints, set `zdr` to `true`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'zdr': True,
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "zdr": true 
    }
}'
```

{% endtab %}
{% endtabs %}

This is useful for customers who don't want to globally enforce ZDR but need to ensure specific requests only route to ZDR endpoints.

### Distillable Text Enforcement (enforce\_distillable\_text)

You can enforce distillable text filtering on a per-request basis using the `enforce_distillable_text` parameter, ensuring your request only routes to models where the author has allowed text distillation.

<table><thead><tr><th width="241.335205078125">Field</th><th width="122.5443115234375">Type</th><th width="109.6444091796875">Default</th><th>Description</th></tr></thead><tbody><tr><td><code>enforce_distillable_text</code></td><td>boolean</td><td><code>false</code></td><td>Restrict routing to only models that allow text distillation.</td></tr></tbody></table>

* When `enforce_distillable_text` is set to `true`, the request will only be routed to models where the author has explicitly enabled text distillation.&#x20;
* When `enforce_distillable_text` is `false` or not provided, it has no effect on routing.

This parameter is useful for applications that need to ensure their requests only use models that allow text distillation for training purposes, such as when building datasets for model fine-tuning or distillation workflows.

#### Example: Enforcing distillable text for a specific request&#x20;

To ensure a request only uses models that allow text distillation, set `enforce_distillable_text` to `true`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'enforce_distillable_text': True,
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "enforce_distillable_text": true 
    }
}'
```

{% endtab %}
{% endtabs %}

### Disabling Fallbacks (allow\_fallbacks)

#### Example: Always choose the cheapest provider with fallbacks disabled

To guarantee that your request is only served by the lowest-cost provider, you can `disable fallbacks`.

This is combined with the `order` field to restrict the providers that Infron will prioritize to just your chosen list.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    'allow_fallbacks': False,
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "sort": "price",
      "allow_fallbacks": false 
    }
}'
```

{% endtab %}
{% endtabs %}

#### Example: Always choose the specific providers with fallbacks disabled

Here's an example with `allow_fallbacks` set to `false`,your request will first be sent to Google AI Studio, and then fails if Google AI Studio fails

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'google/gemini-3-flash-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['google-ai-studio'],
    'allow_fallbacks': False
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "google/gemini-3-flash-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "order": ["google-ai-studio"],
      "allow_fallbacks": false 
    }
}'
```

{% endtab %}
{% endtabs %}

### Allowing Only Specific Providers (only)

You can allow only specific providers for a request by setting the `only` field in the `provider` object.

| Field  | Type      | Default | Description                                       |
| ------ | --------- | ------- | ------------------------------------------------- |
| `only` | string\[] | -       | List of provider slugs to allow for this request. |

Only allowing some providers may significantly reduce fallback options and limit request recovery.&#x20;

#### Example: Only allow Azure for a request calling GPT-4 Omni

Here's an example that will only use Azure for a request calling GPT-4 Omni:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'openai/gpt-5-mini',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'only': ['azure'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "openai/gpt-5-mini",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "only": ["azure"]
    }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/1J9OKMgvo5iBpqr5II2N" alt=""><figcaption></figcaption></figure>

### Ignoring Providers (ignore)

You can ignore providers for a request by setting the `ignore` field in the `provider` object.

| Field    | Type      | Default | Description                                      |
| -------- | --------- | ------- | ------------------------------------------------ |
| `ignore` | string\[] | -       | List of provider slugs to skip for this request. |

Ignoring multiple providers may significantly reduce fallback options and limit request recovery.

#### Example: Ignoring some provider for a request

Here's an example that will ignore some provider:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'ignore': ['deepinfra'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "ignore": ["deepinfra"]
    }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/KCSxnh5cnEAt5du1naci" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/VrXC4tdW06Rhv9zzqQX7" alt=""><figcaption></figcaption></figure>

### Quantization (quantizations)

Quantization reduces model size and computational requirements while aiming to preserve performance. Most LLMs today use FP16 or BF16 for training and inference, cutting memory requirements in half compared to FP32. Some optimizations use FP8 or quantization to reduce size further (e.g., INT8, INT4).

| Field           | Type      | Default | Description                                                         |
| --------------- | --------- | ------- | ------------------------------------------------------------------- |
| `quantizations` | string\[] | -       | List of quantization levels to filter by (e.g. `["int4", "int8"]`). |

Quantized models may exhibit degraded performance for certain prompts, depending on the method used.

Providers can support various quantization levels for open-weight models.

#### Quantization Levels

To filter providers by quantization level, specify the `quantizations` field in the `provider` parameter with the following values:

* `fp16`: Floating point (16 bit)
* `fp8`: Floating point (8 bit)
* `int8`: Integer (8 bit)
* `int4`: Integer (4 bit)
* `none`: Unknown

#### Example: Requesting FP8 Quantization

Here's an example that will only use providers that support FP8 quantization:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'quantizations': ['fp8'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "quantizations": ["fp8"]
    }
}'
```

{% endtab %}
{% endtabs %}


# BYOK

Bring your own provider API keys

Using your own credentials with an **external AI provider** allows Infron to authenticate requests on your behalf `with no added markup`. This approach is useful for using credits provided by the AI provider or executing AI queries that access private cloud data.&#x20;

Infron supports both **Infron credits** and the **option to bring your own provider keys (BYOK)**.

* **Using provider keys** enables direct control over rate limits and costs via your provider account. Your provider keys are securely encrypted and used for all requests routed through the specified provider.&#x20;
* If a query using your credentials fails, Infron will retry the query with its system credentials to improve service availability. When you use **Infron credits**, your rate limits for each provider are managed by Infron.

Integrating credentials like this with AI Gateway is sometimes referred to as Bring-Your-Own-Key, or BYOK.&#x20;

{% hint style="info" %}
**BYOK on Infron AI is free**

Infron AI use your own provider keys and pay only the provider’s standard rates, with no extra platform fees.
{% endhint %}

### Getting started

{% stepper %}
{% step %}

### Retrieve credentials from your AI provider

First, **retrieve credentials (the ai provider api keys)** from your AI provider. Infron uses these credentials first to authenticate requests to that provider. If a query made with your credentials fails, Infron will re-attempt with **system credentials (your infron api key)**, aiming to provide improved availability.
{% endstep %}

{% step %}

### Add the credentials to your byok configuration

* Go to the [Bring Your Own Key (BYOK) page](https://infron.ai/dashboard/byok) in your dashboard.

In the Infron dashboard this feature is found in the **Integrations (BYOK)** section in the sidebar.

<figure><img src="/files/QtTGv7ERZ3MEdQ0LsBNz" alt=""><figcaption></figcaption></figure>

* Find your provider from the list and **click the add icon**.
* In the dialog that appears, enter the credentials you retrieved from the provider.

<figure><img src="/files/fI6yxJbDQBhd0AVVV1fH" alt=""><figcaption></figcaption></figure>

* Ensure that the **Enabled toggle** is turned on so that the credentials are **active**.
* Click **Test Connection** to validate and add your credentials.

This will execute a small test query using a cheap and fast model from the selected provider to verify the health of your credentials. The test is designed to be minimal and cost-effective while ensuring your authentication is working properly.

<figure><img src="/files/wFadrYJNKIZsncdR0Je9" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/DHLqQLRzCt1wuDifiRrq" alt=""><figcaption></figcaption></figure>
{% endstep %}

{% step %}

#### Use the credentials in your real-world requests

Once you add credentials, Infron automatically includes them in your requests. You can now use these credentials to authenticate your requests.

{% tabs %}
{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-oQd0Pprww6oBIgE68dJOVJ3bWcACLtLzB2Kb93masMvJ3OCV" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
}'
```

{% endtab %}
{% endtabs %}
{% endstep %}

{% step %}

### Check the byok logs

Go to the [logs page](https://infron.ai/dashboard/logs) in your dashboard.

<figure><img src="/files/BZUTLfRRAjtKWOuEMRvb" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/jUNiyIZp0xUoNX4d0Zke" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/26C8aF2PyWcsM63jdXsQ" alt=""><figcaption></figcaption></figure>
{% endstep %}
{% endstepper %}

### BYOK with Google Vertex PT Account <a href="#byok-with-provider-ordering" id="byok-with-provider-ordering"></a>

If you've purchased a certain amount of PT GSU on Google Vertex and want to prioritize routing requests to your own Google PT account, you can configure it as follows:

<figure><img src="/files/yXq4eDSQmmEZhYJo3djY" alt=""><figcaption></figcaption></figure>

Copy the JSON key string for the purchased PT GSU account.

<figure><img src="/files/hcP0JDrVYaZJ69nssIkP" alt=""><figcaption></figcaption></figure>

For example, suppose you have purchased `1 GSU` specifically for `Gemini-3-flash` and have configured Google Vertex with BYOK. When you invoke Gemini-3-flash via the Infron gateway, Infron will prioritize routing your request to your dedicated Google Vertex PT account.

However, if your request traffic exceeds the PT GSU capacity you have purchased, Google Vertex will route the excess requests to its public pool. At this point, errors—such as 429 or 500—may occur.&#x20;

When Infron detects such an error, it will automatically fall back to Infron's own resources pool for Gemini-3-flash, thereby ensuring the overall stability of your business operations.

The routing order will be:

```
Routing Flow:
  ┌────────────────────────────┐
  │ Google Vertex(PT)          │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Google Vertex (Infron’s shared capacity)    │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Google AI Studio (Infron’s Provider Routing Fallback)         │
  └────────────────────────────┘

```

<figure><img src="/files/nnDrZlLjgDcC8PPsPTcY" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/mtcJTA0ulmPXzJgZFQxW" alt=""><figcaption></figcaption></figure>

### BYOK with Provider Ordering <a href="#byok-with-provider-ordering" id="byok-with-provider-ordering"></a>

When you combine BYOK keys with [provider ordering](/docs/routing-and-gateway/inference-provider-routing#ordering-specific-providers-order), **Infron** **always prioritizes BYOK endpoints first**, regardless of where that provider appears in your specified order. After all BYOK endpoints are exhausted, Infron falls back to shared capacity in the order you specified.

This means BYOK keys effectively override your provider ordering for the initial routing attempts. For example, if you have BYOK keys for Amazon Bedrock, Google Vertex AI, and Anthropic, and you send a request with:

```bash
{
  "provider": {
    "allow_fallbacks": true,
    "order": ["amazon-bedrock", "google-vertex", "anthropic"]
  }
}
```

The routing order will be:

```
Routing Flow:
  ┌────────────────────────────┐
  │ Amazon Bedrock (BYOK)      │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Google Vertex AI (BYOK)    │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Anthropic (BYOK)           │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Amazon Bedrock (Infron’s shared capacity)    │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Google Vertex AI (Infron’s shared capacity)  │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Anthropic (Infron’s shared capacity)         │
  └────────────────────────────┘

```

### **Partial BYOK with Provider Ordering**

If you only have a BYOK key for some of the providers in your order, the BYOK provider is still tried first. For example, if you specify&#x20;

```json
{
  "provider": {
    "allow_fallbacks": true,
    "order": ["amazon-bedrock", "google-vertex"]
  }
}
```

but only have a BYOK key for Google Vertex AI.&#x20;

The routing order will be:

```
Routing Flow:
  ┌──────────────▼─────────────┐
  │ Google Vertex AI (BYOK)    │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Amazon Bedrock (Infron’s shared capacity)    │
  └──────────────┬─────────────┘
                 │
  ┌──────────────▼─────────────┐
  │ Google Vertex AI (Infron’s shared capacity)  │
  └───────────────────────────┘
```

Note that even though Amazon Bedrock is listed first in the `order` array, the Google Vertex AI BYOK endpoint takes priority.

### AWS Bedrock API Keys <a href="#aws-bedrock-api-keys" id="aws-bedrock-api-keys"></a>

To use Amazon Bedrock with Infron, you can authenticate using either **Bedrock API** **keys** or **traditional AWS credentials**.

#### **Option 1: Bedrock API Keys (Recommended)**

Amazon Bedrock API keys provide a simpler authentication method. Simply provide your Bedrock API key as a string:

```
your-bedrock-api-key-here
```

**Note:** Bedrock API keys are tied to a specific AWS region and cannot be used to change regions. If you need to use models in different regions, use the AWS credentials option below.

You can generate Bedrock API keys in the AWS Management Console. Learn more in the [Amazon Bedrock API keys documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/api-keys.html).

#### **Option 2: AWS Credentials**

Alternatively, you can use traditional **AWS credentials** in JSON format. This option allows you to specify the region and provides more flexibility:

```json
{
  "accessKeyId": "your-aws-access-key-id",
  "secretAccessKey": "your-aws-secret-access-key",
  "region": "your-aws-region"
}
```

You can find these values in your AWS account:

1. **accessKeyId**: This is your AWS Access Key ID. You can create or find your access keys in the AWS Management Console under “Security Credentials” in your AWS account.
2. **secretAccessKey**: This is your AWS Secret Access Key, which is provided when you create an access key.
3. **region**: The AWS region where your Amazon Bedrock models are deployed (e.g., “us-east-1”, “us-west-2”).

Make sure your AWS IAM user or role has the necessary permissions to access Amazon Bedrock services. At minimum, you’ll need permissions for:

* `bedrock:InvokeModel`
* `bedrock:InvokeModelWithResponseStream` (for streaming responses)

Example IAM policy:

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
    
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "*"
    }
  ]
}
```

For enhanced security, we recommend creating dedicated IAM users with limited permissions specifically for use with Infron.

Learn more in the [AWS Bedrock Getting Started with the API](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started-api.html) documentation, [IAM Permissions Setup](https://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html) guide, or the [AWS Bedrock API Reference](https://docs.aws.amazon.com/bedrock/latest/APIReference/welcome.html).

### Google Vertex API Keys <a href="#google-vertex-api-keys" id="google-vertex-api-keys"></a>

To use Google Vertex AI with Infron, you’ll need to provide your **Google Cloud service account key** in JSON format. The service account key should include all standard Google Cloud service account fields, with an optional `region` field for specifying the deployment region.

```json
{
      "type": "service_account",
      "project_id": "your-project-id",
      "private_key_id": "your-private-key-id",
      "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
      "client_email": "your-service-account@your-project.iam.gserviceaccount.com",
      "client_id": "your-client-id",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-service-account@your-project.iam.gserviceaccount.com",
      "universe_domain": "googleapis.com",
      "region": "global"
  } 
```

You can find these values in your Google Cloud Console:

1. **Service Account Key**: Navigate to the Google Cloud Console, go to “IAM & Admin” > “Service Accounts”, select your service account, and create/download a JSON key.
2. **region** (optional): Specify the region for your Vertex AI deployment. Use `"global"` to allow requests to run in any available region, or specify a specific region like `"us-central1"` or `"europe-west1"`.

Make sure your service account has the necessary permissions to access Vertex AI services:

* `aiplatform.endpoints.predict`
* `aiplatform.endpoints.streamingPredict` (for streaming responses)

Example IAM policy:&#x20;

```json
{
  "bindings": [
    {
      "role": "roles/aiplatform.user",
      "members": [
        "serviceAccount:your-service-account@your-project.iam.gserviceaccount.com"
      ]
    }
  ]
}
```

Learn more in the [Google Cloud Vertex AI documentation](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform) and [Service Account setup guide](https://cloud.google.com/iam/docs/service-accounts-create).


# Available Providers

You can view the available models for a provider in the [Provider List page](https://infron.ai/providers).

| Name                                   | Website                                                                                                                                           |
| -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| AI21                                   | [https://www.ai21.com](https://www.ai21.com/)                                                                                                     |
| AionLabs                               | [https://www.aionlabs.ai](https://www.aionlabs.ai/)                                                                                               |
| Alibaba Cloud Int.                     | [https://www.alibabacloud.com](https://www.alibabacloud.com/)                                                                                     |
| Alibaba Cloud Int.(CN)                 | [https://www.alibabacloud.com](https://www.alibabacloud.com/)                                                                                     |
| Amazon Bedrock                         | [https://aws.amazon.com/bedrock](https://aws.amazon.com/bedrock/)                                                                                 |
| Amazon SageMaker                       | [https://aws.amazon.com/sagemaker](https://aws.amazon.com/sagemaker/)                                                                             |
| Ambient                                | -                                                                                                                                                 |
| Anthropic                              | [https://www.anthropic.com](https://www.anthropic.com/)                                                                                           |
| Anyscale                               | <https://www.anyscale.com/product/platform>                                                                                                       |
| Arcee AI                               | [https://www.arcee.ai](https://www.arcee.ai/)                                                                                                     |
| AtlasCloud                             | [https://www.atlascloud.ai](https://www.atlascloud.ai/)                                                                                           |
| Azure                                  | [https://azure.microsoft.com/services/cognitive-services/openai-service](https://azure.microsoft.com/services/cognitive-services/openai-service/) |
| Baseten                                | [https://www.baseten.co](https://www.baseten.co/)                                                                                                 |
| Beam                                   | [https://www.beam.cloud](https://www.beam.cloud/)                                                                                                 |
| Bento Cloud                            | [https://bentoml.com](https://bentoml.com/)                                                                                                       |
| Black Forest Labs                      | [https://bfl.ai](https://bfl.ai/)                                                                                                                 |
| BytePlus                               | [https://byteplus.com](https://byteplus.com/)                                                                                                     |
| CanopyWave                             | [https://canopywave.com](https://canopywave.com/)                                                                                                 |
| Cerebras                               | [https://www.cerebras.ai](https://www.cerebras.ai/)                                                                                               |
| Chutes                                 | [https://chutes.ai](https://chutes.ai/)                                                                                                           |
| Cirrascale                             | [https://www.cirrascale.com](https://www.cirrascale.com/)                                                                                         |
| Clarifai                               | [https://www.clarifai.com](https://www.clarifai.com/)                                                                                             |
| Cloudflare                             | [https://developers.cloudflare.com](https://developers.cloudflare.com/)                                                                           |
| Cloudsway                              | [https://cloudsway.ai](https://cloudsway.ai/)                                                                                                     |
| Cohere                                 | [https://cohere.com](https://cohere.com/)                                                                                                         |
| CoreWeave                              | [https://www.coreweave.com](https://www.coreweave.com/)                                                                                           |
| Crusoe                                 | [https://www.crusoe.ai](https://www.crusoe.ai/)                                                                                                   |
| Cyfuture                               | [https://cyfuture.ai](https://cyfuture.ai/)                                                                                                       |
| Databricks                             | [https://www.databricks.com](https://www.databricks.com/)                                                                                         |
| DeepInfra                              | [https://deepinfra.com](https://deepinfra.com/)                                                                                                   |
| DeepSeek                               | [https://www.deepseek.com](https://www.deepseek.com/)                                                                                             |
| Digital Ocean                          | <https://www.digitalocean.com/products/gradient/platform>                                                                                         |
| ElevenLabs                             | [https://elevenlabs.io](https://elevenlabs.io/)                                                                                                   |
| Exa                                    | [https://exa.ai](https://exa.ai/)                                                                                                                 |
| [Fal.ai](http://fal.ai/)               | [https://fal.ai](https://fal.ai/)                                                                                                                 |
| Featherless                            | [https://featherless.ai](https://featherless.ai/)                                                                                                 |
| Firecrawl                              | [https://www.firecrawl.dev](https://www.firecrawl.dev/)                                                                                           |
| Fireworks                              | [https://fireworks.ai](https://fireworks.ai/)                                                                                                     |
| Friendli                               | [https://friendli.ai](https://friendli.ai/)                                                                                                       |
| GMICloud                               | [https://www.gmicloud.ai](https://www.gmicloud.ai/)                                                                                               |
| Google AI Studio                       | [https://aistudio.google.com](https://aistudio.google.com/)                                                                                       |
| Google Vertex                          | <https://cloud.google.com/vertex-ai>                                                                                                              |
| Groq                                   | [https://groq.com](https://groq.com/)                                                                                                             |
| HF Inference                           | -                                                                                                                                                 |
| Huawei Cloud                           | [https://www.huaweicloud.com/intl/en-us](https://www.huaweicloud.com/intl/en-us/)                                                                 |
| Hyperbolic                             | [https://www.hyperbolic.ai](https://www.hyperbolic.ai/)                                                                                           |
| Hyperstack                             | [https://www.hyperstack.cloud](https://www.hyperstack.cloud/)                                                                                     |
| Inception                              | [https://www.inceptionlabs.ai](https://www.inceptionlabs.ai/)                                                                                     |
| Inceptron                              | [https://www.inceptron.io](https://www.inceptron.io/)                                                                                             |
| [Inference.net](http://inference.net/) | [https://inference.net](https://inference.net/)                                                                                                   |
| Infermatic                             | [https://infermatic.ai](https://infermatic.ai/)                                                                                                   |
| Inferx                                 | [https://inferx.net](https://inferx.net/)                                                                                                         |
| Inflection                             | [https://developers.inflection.ai](https://developers.inflection.ai/)                                                                             |
| InfronAI                               | [https://infron.ai](https://infron.ai/)                                                                                                           |
| Jina                                   | [https://jina.ai](https://jina.ai/)                                                                                                               |
| Kling AI                               | <https://klingai.com/global/dev>                                                                                                                  |
| Lambda Labs                            | [https://lambda.ai](https://lambda.ai/)                                                                                                           |
| LeftNorth                              | [https://leftnorth.com](https://leftnorth.com/)                                                                                                   |
| Liquid                                 | [https://leap.liquid.ai](https://leap.liquid.ai/)                                                                                                 |
| Loveon                                 | <https://loveon.chat/roleplay-api>                                                                                                                |
| Mancer                                 | [https://mancer.tech](https://mancer.tech/)                                                                                                       |
| Meituan                                | [longcat.ai](https://longcat.ai/)                                                                                                                 |
| MiniMax                                | [https://www.minimax.io](https://www.minimax.io/)                                                                                                 |
| Mistral                                | [https://mistral.ai](https://mistral.ai/)                                                                                                         |
| Modal                                  | [https://modal.com](https://modal.com/)                                                                                                           |
| Moonshot AI                            | [https://www.moonshot.ai](https://www.moonshot.ai/)                                                                                               |
| Morph                                  | [https://www.morphllm.com](https://www.morphllm.com/)                                                                                             |
| Nebius Token Factory                   | [https://tokenfactory.nebius.com](https://tokenfactory.nebius.com/)                                                                               |
| Nexgen Cloud                           | [https://www.nexgencloud.com](https://www.nexgencloud.com/)                                                                                       |
| NextBit                                | [https://www.nextbit256.com](https://www.nextbit256.com/)                                                                                         |
| Novita                                 | [https://novita.ai](https://novita.ai/)                                                                                                           |
| Nscale                                 | [https://www.nscale.com](https://www.nscale.com/)                                                                                                 |
| NVIDIA                                 | [https://brev.nvidia.com](https://brev.nvidia.com/)                                                                                               |
| OpenAI                                 | [https://openai.com](https://openai.com/)                                                                                                         |
| OpenInference                          | [https://www.openinference.xyz](https://www.openinference.xyz/)                                                                                   |
| OVHcloud                               | [https://www.ovhcloud.com/en/public-cloud/ai-endpoints](https://www.ovhcloud.com/en/public-cloud/ai-endpoints/)                                   |
| Parasail                               | [https://parasail.io](https://parasail.io/)                                                                                                       |
| Perplexity                             | [https://www.perplexity.ai](https://www.perplexity.ai/)                                                                                           |
| Phala                                  | [https://phala.com](https://phala.com/)                                                                                                           |
| Prodia                                 | [https://prodia.com](https://prodia.com/)                                                                                                         |
| Public AI                              | [https://publicai.co](https://publicai.co/)                                                                                                       |
| Rafay                                  | <https://rafay.co/platform/serverless-inference>                                                                                                  |
| Recraft                                | [https://www.recraft.ai](https://www.recraft.ai/)                                                                                                 |
| Relace                                 | [https://www.relace.ai](https://www.relace.ai/)                                                                                                   |
| Replicate                              | [https://replicate.com](https://replicate.com/)                                                                                                   |
| RouteWay                               | [https://routeway.ai](https://routeway.ai/)                                                                                                       |
| RunPod                                 | [https://www.runpod.io](https://www.runpod.io/)                                                                                                   |
| Runware                                | [https://runware.ai](https://runware.ai/)                                                                                                         |
| SambaNova                              | [https://sambanova.ai](https://sambanova.ai/)                                                                                                     |
| Scaleway                               | [https://www.scaleway.com/en/generative-apis](https://www.scaleway.com/en/generative-apis/)                                                       |
| SiliconFlow                            | [https://www.siliconflow.com](https://www.siliconflow.com/)                                                                                       |
| Sourceful                              | [https://www.sourceful.com](https://www.sourceful.com/)                                                                                           |
| StepFun                                | [https://stepfun.ai](https://stepfun.ai/)                                                                                                         |
| StreamLake                             | [https://www.streamlake.ai](https://www.streamlake.ai/)                                                                                           |
| Switchpoint                            | [https://www.switchpoint.dev](https://www.switchpoint.dev/)                                                                                       |
| Tavily                                 | [https://tavily.com](https://tavily.com/)                                                                                                         |
| Together                               | [https://www.together.ai](https://www.together.ai/)                                                                                               |
| Upstage                                | [https://www.upstage.ai](https://www.upstage.ai/)                                                                                                 |
| Venice                                 | [https://venice.ai](https://venice.ai/)                                                                                                           |
| Vercel                                 | <https://v0.app/docs/api/model>                                                                                                                   |
| Verda                                  | [https://verda.com](https://verda.com/)                                                                                                           |
| Voyage                                 | [https://www.voyageai.com](https://www.voyageai.com/)                                                                                             |
| WaveSpeed                              | [https://wavespeed.ai](https://wavespeed.ai/)                                                                                                     |
| Weights & Biases                       | [https://wandb.ai](https://wandb.ai/)                                                                                                             |
| xAI                                    | [https://x.ai](https://x.ai/)                                                                                                                     |
| Xiaomi                                 | [https://platform.xiaomimimo.com](https://platform.xiaomimimo.com/)                                                                               |
| [Z.ai](http://z.ai/)                   | [https://z.ai](https://z.ai/)                                                                                                                     |


# Zero Completion Insurance

Infron will not charge you for zero token responses

Infron provides zero completion insurance to protect users from being charged for failed or empty responses. When a response contains no output tokens and either has a blank finish reason or an error, you will not be charged for the request, even if the underlying provider charges for prompt processing.

Zero completion insurance is automatically enabled for all accounts and requires no configuration.

### How It Works <a href="#how-it-works" id="how-it-works"></a>

Zero completion insurance automatically applies to all requests across all models and providers. When a response meets either of these conditions, no credits will be deducted from your account:

* The response has zero completion tokens AND a blank/null finish reason
* The response has an error finish reason

### Viewing Protected Requests <a href="#viewing-protected-requests" id="viewing-protected-requests"></a>

On your logs page, requests that were protected by zero completion insurance will show zero credits deducted. This applies even in cases where Infron may have been charged by the provider for prompt processing.


# Zero Data Retention

How Infron gives you control over your data

**Zero Data Retention (ZDR)** means that a provider will not store your data for any period of time.

**Infron allows you to route to endpoints that have a Zero Data Retention policy.**

Providers that do not retain your data are also unable to train on your data. However we do have some endpoints & providers who do not train on your data but *do* retain it (e.g. to scan for abuse or for legal reasons). Infron gives you controls over both of these policies.

### How Infron Manages Data Policies <a href="#how-openrouter-manages-data-policies" id="how-openrouter-manages-data-policies"></a>

Infron works with providers to understand each of their data policies and structures the policy data in a way that gives you control over which providers you want to route to.

Note that a provider’s general policy may differ from the specific policy for a given endpoint. Infron keeps track of the specific policy for each endpoint, works with providers to keep these policies up to date, and in some cases creates **special agreements** with providers to ensure data retention or training policies that are more privacy-focused than their default policies.

If Infron is not able to establish or ascertain a clear policy for a provider or endpoint, we take a conservative stance and assume that the endpoint both retains and trains on data and mark it as such.

### Per-Request ZDR Enforcement <a href="#per-request-zdr-enforcement" id="per-request-zdr-enforcement"></a>

You can enforce Zero Data Retention on a per-request basis using the `zdr` parameter in your API calls.

#### Usage <a href="#usage" id="usage"></a>

Include the [`zdr`](/docs/routing-and-gateway/inference-provider-routing#zero-data-retention-enforcement-zdr) parameter in your provider preferences:

```json
{
  "model": "gpt-4",
  "messages": [...],
  "provider": {
    "zdr": true
  }
}
```

When `zdr` is set to `true`, the request will only be routed to endpoints that have a Zero Data Retention policy.

### Caching <a href="#caching" id="caching"></a>

Some endpoints/models provide implicit caching of prompts. This keeps repeated prompt data in an in-memory cache in the provider’s datacenter, so that the repeated part of the prompt does not need to be re-processed. This can lead to considerable cost savings.

Infron has taken the stance that in-memory caching of prompts is *not* considered “retaining” data, and we therefore allow endpoints/models with implicit caching to be hit when a ZDR routing policy is in effect.

### Infron’s Retention Policy <a href="#openrouters-retention-policy" id="openrouters-retention-policy"></a>

**Infron itself has a ZDR policy**; your prompts are not retained.


# Structured Outputs

Return structured data from your models.

Infron supports **structured outputs** for **compatible models**, ensuring responses follow a specific schema format. This feature is particularly useful when you need consistent, well-formatted responses that can be reliably parsed by your application.

Structured outputs allow you to:

* Enforce specific JSON Schema validation on model responses
* Get consistent, type-safe outputs
* Avoid parsing errors and hallucinated fields
* Simplify response handling in your application

### Model Support

To ensure your chosen model supports structured outputs:

1. Check the model's supported parameters on the [models page](https://infron.ai/models)
2. Include `response_format` and set `type: json_schema` in the required parameters

<figure><img src="/files/vGPCczlzp28jaY3h7W5h" alt=""><figcaption></figcaption></figure>

### Using Structured Outputs

To use structured outputs, include a `response_format` parameter in your request, with `type` set to `json_schema` and the `json_schema` object containing your schema:

#### json\_scheme

{% tabs %}
{% tab title="Python (json\_schema)" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer <API KEY>",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "openai/gpt-5.2",
    "messages": [
      {"role": "user", "content": "What's the weather like in London?"}
    ],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "weather",
        "strict": True,
        "schema": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City or location name"
            },
            "temperature": {
              "type": "number",
              "description": "Temperature in Celsius"
            },
            "conditions": {
              "type": "string",
              "description": "Weather conditions description"
            }
          },
          "required": ["location", "temperature", "conditions"],
          "additionalProperties": False
        }
      }
    }
  })
)
print(response.json()["choices"][0]["message"]["content"])
```

{% endtab %}
{% endtabs %}

The model will respond with a JSON object that strictly follows your schema:

```json
{
	"conditions": "Overcast with light rain showers",
	"location": "London, United Kingdom",
	"temperature": 12.5
}
```

#### CalendarEvent

{% tabs %}
{% tab title="Python (CalendarEvent)" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer <API KEY>",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "openai/gpt-5.2",
    "messages": [
        {"role": "system", "content": "Extract the event information."},
        {
            "role": "user",
            "content": "Alice and Bob are going to a science fair on Friday.",
        }
    ],
    "response_format": {
      "type": "CalendarEvent"
    }
  })
)
print(response.json())
```

{% endtab %}
{% endtabs %}

The response example:

```json
{
  'id': 'gen-1766491236-RRKOrbaRaoxv1ogA56cd',
  'model': 'gpt-5.1',
  'object': 'chat.completion',
  'created': 1766491236,
  'choices': [{
    'index': 0,
    'message': {
      'role': 'assistant',
      'content': '- Event: Science fair  \n- Participants: Alice, Bob  \n- Date: Friday  \n- Location: Not specified',
      'reasoning': "**Extracting event information**\n\nI need to pull event details from the user's sentence, but I’m not sure about the exact format they want. Probably it should be brief, including the event name, date, participants, and location. For example, if it's a science fair, I might list the day as Friday with participants being Alice and Bob. However, I notice there's no location mentioned, so I’ll keep my answer concise while ensuring I cover everything necessary!",
      'reasoning_details': [{
        'id': '',
        'format': 'openai-responses-v1',
        'index': 0,
        'type': 'reasoning.summary',
        'data': ''
      }, {
        'id': 'rs_0ae935064fdad29601694a8464e37c81908e9a2d8154046487',
        'format': 'openai-responses-v1',
        'index': 0,
        'type': 'reasoning.encrypted',
        'data': 'gAAAAABpSoRojMddY32GRTMLysJlQBXeniX2FaZJJL7MUvWNRcqNOdQBtE4xfOW7nUXlehVdMCiSlGm1jvi8u7ukprfSa2F9bca_V-XBq_DxzdiecWn08qM3wDA5Xe_GpcST_BGaIsEWSCFKqg0d7chukEHqD2-20NBSYU8XAbIKiSCcoiwduuxBDw1S8UdCbuBFuF9D_B_jiDlgWguMYEST3BGbb35qR261BlsB2E3obmW6CX2D96cyFk8ZrOXRd7aytqxgMSa-k2l0yJSMFvWMlrY7DhC80cIhgmQWfl-jVknCyURFMjN1_g0Jzzdfh2O_nooLVqJmH-j3mFhCsw9SnDFMW_jhrSry6956BIp1aPtYDDusYpb-CLHqBbJtzjzCod8OfBrtWNfRwa3eQkMZQ9F53i0q3cqNan-TWistHZBunH8vI6StAwhh3J1LMHSROXgCPJKq2VPfaNhS1ZbLxvohoHdddBxeNMrwC6qGNADeOx9zTvawqoVGc-4ivsDM2lhBSb8L1iKBEuKK38pnb8Pk8kpIqz7e-5n6_fl9VW5SoGJbobSuVQ7QwyMvvd94clcUZE0nNOtJT2X24t05XX8vqAmdDsRB3th_NNDW3RHxqCxJ0NiNZPq0o69UpGJzU5_o87Z0b1iSeY_WWcptdAs46y44c7YPLB_eW7VXMzuthC-4wmftJF_6ZP_uzHjF_f9otc7GkkiRtW9vZHdVYvV2mBVwuFUNKebTnOkXc-7xQQRqdhvzi1-Te-9uNsqtRqVYeqRrQJQJuDfhtZzTbxyyPq3ilO0AnVVE_Mu84CnUjE_xr1fJidXttMKRbQhsWcGVMmxI-uEExeMLbaKNiRTEcjUZtQukDNXeUaUtzHFk80G5tQAH6bJymAkIE1b7x9glW4Vr2cxRN36UhENl9raF0Oak2A8mAQ9_zkAzL_Yb49QeiHP18YSuklKNzc7dQwv6DqdQ7m4-mmRY3mUP4sLRthDb_l8M99DR506mwJQ03xj4THVG6c3zB-qyt7CrxZHUdFBtkZRG5YoZHf_7KCQ7ARCo-gndMJH9AAGwe-WXYR8eJ8_aiXQx0JyzAvIlIdtMXXa_d7VQJ6YVPhK21kRLv_FqOVvGudBv3AJWpyQLPpnbbIY='
      }]
    },
    'finish_reason': 'stop',
    'native_finish_reason': 'completed',
    'logprobs': None
  }],
  'request_id': '112648cf661f4650ac3c3cdc38832972',
  'usage': {
    'prompt_tokens': 27,
    'completion_tokens': 78,
    'total_tokens': 105,
    'prompt_tokens_details': {},
    'completion_tokens_details': {
      'reasoning_tokens': 49
    },
    'input_tokens': 0,
    'output_tokens': 0,
    'ttft': 0,
    'server_tool_use': {
      'web_search_requests': ''
    }
  }
}
```

### Streaming with Structured Outputs

Structured outputs are also supported with streaming responses. The model will stream valid partial JSON that, when complete, forms a valid response matching your schema.

To enable streaming with structured outputs, simply add `stream: true` to your request:

```json
{
  "stream": true,
  "response_format": {
    "type": "json_schema",
    // ... rest of your schema
  }
}
```

### Best Practices

1. **Include descriptions**: Add clear `descriptions` to your `schema properties` to guide the model
2. **Use strict mode**: Always set `strict: true` to ensure the model follows your schema exactly

### Error Handling

When using structured outputs, you may encounter these scenarios:

1. **Model doesn't support structured outputs**: The request will fail with an error indicating lack of support
2. **Invalid schema**: The model will return an error if your JSON Schema is invalid


# Tool Calling

Tool & Function Calling - Use tools in your prompts

**Tool calling** (also known as **function calls**) give an LLM access to external tools. The LLM does not call the tools directly. Instead, it suggests the tool to call. The user then calls the tool separately and provides the results back to the LLM. Finally, the LLM formats the response into an answer to the user's original question.

**Tool calling** provides LLMs with a powerful and flexible way to interface with external systems and access data beyond their training set. This guide shows how to connect a model to the data and actions provided by your application.

**Tool calling is a multi-turn conversation between your application and the model**. The tool calling flow has five main steps:

1. Send a request to the model, including the tools it can call
2. Receive tool calls from the model
3. Execute code on the application side using the tool call inputs
4. Send a new request to the model, including the tool outputs
5. Receive the final response from the model (or additional tool calls)

![](https://cdn.openai.com/API/docs/images/function-calling-diagram-steps.png)

**Infron supports tool calling across multiple API protocols:**

| API protocols                                                                   | parameters                                                            |
| ------------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [**OpenAI Chat Completion API**](/docs/llm-apis/openai-compatible-api/overview) | Use the `tools` and `tool_choice` parameters                          |
| **OpenAI Responses API (Beta)**                                                 | Use the `tools` parameter; responses include the `function_call` type |
| **Anthropic Messages API (Beta)**                                               | Use the `tools` parameter; tool definitions use `input_schema`        |

**Protocol comparison**

| Feature              | Chat Completion | Responses API          | Anthropic Messages | Vertex AI               |
| -------------------- | --------------- | ---------------------- | ------------------ | ----------------------- |
| Tool parameter name  | `tools`         | `tools`                | `tools`            | `tools`                 |
| Schema field         | `parameters`    | `parameters`           | `input_schema`     | `parameters`            |
| Tool call identifier | `tool_calls`    | `function_call`        | `tool_use`         | `functionCall`          |
| Result field         | `tool` role     | `function_call_output` | `tool_result`      | `functionResponse`      |
| Parallel calls       | ✅               | ✅                      | ✅                  | ✅                       |
| Forced calling       | `tool_choice`   | `tool_choice`          | `tool_choice`      | `functionCallingConfig` |
| Strict mode          | `strict: true`  | ✅                      | `strict: true`     | `VALIDATED` mode        |
| Streaming support    | ✅               | ✅                      | ✅                  | ✅                       |

**Supported Models**: You can find models that support tool calling by filtering on <https://infron.ai/models?supported_parameters=tools>

<figure><img src="/files/0RADi8IPv3AzrPrjxZxJ" alt=""><figcaption></figcaption></figure>

If you prefer to learn from a full end-to-end example, keep reading.

### **OpenAI Chat Completion API**

#### Tool calling example

Let’s look at a complete tool calling flow, using `get_horoscope` to fetch a daily horoscope for a zodiac sign.

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI
import json

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

# 1. Define the list of callable tools for the model
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_horoscope",
            "description": "Get today's horoscope for a zodiac sign.",
            "parameters": {
                "type": "object",
                "properties": {
                    "sign": {
                        "type": "string",
                        "description": "Zodiac sign name, e.g., Taurus or Aquarius",
                    },
                },
                "required": ["sign"],
            },
        },
    },
]

# Create the message list; we'll append messages to it over time
input_list = [
    {"role": "user", "content": "How's my horoscope? I'm an Aquarius."}
]

# 2. Prompt the model with the defined tools
response = client.chat.completions.create(
    model="moonshotai/kimi-k2-0905",
    tools=tools,
    messages=input_list,
)

# Save the function call output for subsequent requests
function_call = None
function_call_arguments = None
input_list.append({
  "role": "assistant",
  "content": response.choices[0].message.content,
  "tool_calls": [tool_call.model_dump() for tool_call in response.choices[0].message.tool_calls] if response.choices[0].message.tool_calls else None,
})

for item in response.choices[0].message.tool_calls:
    if item.type == "function":
        function_call = item
        function_call_arguments = json.loads(item.function.arguments)

def get_horoscope(sign):
    return f"{sign}: Next Tuesday you'll meet a baby otter."

# 3. Execute the get_horoscope function logic
result = {"horoscope": get_horoscope(function_call_arguments["sign"])}

# 4. Provide the function call result to the model
input_list.append({
    "role": "tool",
    "tool_call_id": function_call.id,
    "name": function_call.function.name,
    "content": json.dumps(result),
})

print("Final input:")
print(json.dumps(input_list, indent=2, ensure_ascii=False))

response = client.chat.completions.create(
    model="moonshotai/kimi-k2-0905",
    tools=tools,
    messages=input_list,
)

# 5. The model should now be able to respond!
print("Final output:")
print(response.model_dump_json(indent=2))
print("\n" + response.choices[0].message.content)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import OpenAI from "openai";
const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

// 1. Define the list of callable tools for the model
const tools: OpenAI.Chat.Completions.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_horoscope",
      description: "Get today's horoscope for a zodiac sign.",
      parameters: {
        type: "object",
        properties: {
          sign: {
            type: "string",
            description: "Zodiac sign name, e.g., Taurus or Aquarius",
          },
        },
        required: ["sign"],
      },
    },
  },
];

// Create the message list; we'll append messages to it over time
let input: OpenAI.Chat.Completions.ChatCompletionMessageParam[] = [
  { role: "user", content: "How's my horoscope? I'm an Aquarius." },
];

async function main() {
  // 2. Use a model that supports tool calling
  let response = await openai.chat.completions.create({
    model: "moonshotai/kimi-k2-0905",
    tools,
    messages: input,
  });

  // Save the function call output for subsequent requests
  let functionCall: OpenAI.Chat.Completions.ChatCompletionMessageFunctionToolCall | undefined;
  let functionCallArguments: Record<string, string> | undefined;
  input = input.concat(response.choices.map((c) => c.message));

  response.choices.forEach((item) => {
    if (item.message.tool_calls && item.message.tool_calls.length > 0) {
      functionCall = item.message.tool_calls[0] as OpenAI.Chat.Completions.ChatCompletionMessageFunctionToolCall;
      functionCallArguments = JSON.parse(functionCall.function.arguments) as Record<string, string>;
    }
  });

  // 3. Execute the get_horoscope function logic
  function getHoroscope(sign: string) {
    return sign + " Next Tuesday you'll meet a baby otter.";
  }

  if (!functionCall || !functionCallArguments) {
    throw new Error("The model did not return a function call");
  }

  const result = { horoscope: getHoroscope(functionCallArguments.sign) };

  // 4. Provide the function call result to the model
  input.push({
    role: 'tool',
    tool_call_id: functionCall.id,
    // @ts-expect-error must have name
    name: functionCall.function.name,
    content: JSON.stringify(result),
  });
  console.log("Final input:");
  console.log(JSON.stringify(input, null, 2));

  response = await openai.chat.completions.create({
    model: "moonshotai/kimi-k2-0905",
    tools,
    messages: input,
  });

  // 5. The model should now be able to respond!
  console.log("Final output:");
  console.log(JSON.stringify(response.choices.map(v => v.message), null, 2));
}

main();
```

{% endtab %}
{% endtabs %}

#### Defining a function tool (function)

Function tools can be configured via the `tools` parameter. A function tool is defined by its schema, which tells the model what the function does and what input parameters it expects. A function tool definition includes the following fields:

| Field                | Description                                                              |
| -------------------- | ------------------------------------------------------------------------ |
| type                 | Must always be `function`                                                |
| function             | The tool object                                                          |
| function.name        | Function name (e.g., `get_weather`)                                      |
| function.description | Detailed information on when and how to use the function                 |
| function.parameters  | JSON Schema defining the function input parameters                       |
| function.strict      | Whether to enable strict schema adherence when generating function calls |

Below is the definition for a `get_weather` function tool:

```json
{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Retrieve the current weather for a given location.",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City and country, for example: Bogotá, Colombia"
        },
        "units": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "The unit for the returned temperature."
        }
      },
      "required": ["location", "units"],
      "additionalProperties": false
    },
    "strict": true
  }
}
```

Handling tool calls (Tool calling)

When the model calls a tool in `tools`, you must execute that tool and return the result. Since tool calling may include zero, one, or multiple calls, best practice is to assume there may be multiple.

#### Response format <a href="#response-format" id="response-format"></a>

When the model needs to call tools, the response `finish_reason` is `"tool_calls"`, and `message` includes a `tool_calls` array:

```json
{
  "id": "chatcmpl_xxx",
  "model": "openai/gpt-4.1-nano",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Beijing\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
```

Each call in the `tool_calls` array contains:

* `id`: a unique identifier used when submitting the function result later
* `type`: the tool `type`, typically `function` or `custom`
* `function`: the function object
  * `name`: the function name
  * `arguments`: JSON-encoded function arguments

Example `tool_calls` containing multiple tool calls:

```json
[
  {
    "id": "fc_12345xyz",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\":\"Paris, France\"}"
    }
  },
  {
    "id": "fc_67890abc",
    "type": "function",
    "function": {
      "name": "get_weather",
      "arguments": "{\"location\":\"Bogotá, Colombia\"}"
    }
  },
  {
    "id": "fc_99999def",
    "type": "function",
    "function": {
      "name": "send_email",
      "arguments": "{\"to\":\"andrew@gettrust.ai\",\"body\":\"Hi Andrew\"}"
    }
  }
]
```

Execute tool calls and append results

{% tabs %}
{% tab title="Python" %}

```python
for choice in response.choices:
    for tool_call in choice.message.tool_calls or []:
        if tool_call.type != "function":
            continue

        name = tool_call.function.name
        args = json.loads(tool_call.function.arguments)

        result = call_function(name, args)
        input_list.append({
            "role": "tool",
            "name": name,
            "tool_call_id": tool_call.id,
            "content": str(result)
        })
```

{% endtab %}
{% endtabs %}

**Controlling tool calling behavior (`tool_choice`)**

By default, the model decides when and how many tools to call. You can control tool calling behavior using the `tool_choice` parameter.

1. **Auto:** (*default*) Call zero, one, or multiple tools. `tool_choice: "auto"`
2. **Required:** Call one or more tools. `tool_choice: "required"`

**When to use (allowed\_tools)**

If you want the model to use only a subset of the tool list in a given request—without modifying the tool list you pass in, to maximize prompt caching—you can configure `allowed_tools`.

```json
"tool_choice": {
    "type": "allowed_tools",
    "mode": "auto",
    "tools": [
        { "type": "function", "function": { "name": "get_weather" } },
        { "type": "function", "function": { "name": "get_time" } }
    ]
}
```

You can also set `tool_choice` to `"none"` to force the model not to call any tools.

#### Streaming <a href="#streaming" id="streaming"></a>

Streaming tool calling is very similar to streaming normal responses: set `stream` to `true` and receive a stream of `events`.

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g. Bogotá, Colombia"
                }
            },
            "required": [
                "location"
            ],
            "additionalProperties": False
        }
    }
}]

stream = client.chat.completions.create(
    model="moonshotai/kimi-k2-0905",
    messages=[{"role": "user", "content": "What's the weather like in Paris today?"}],
    tools=tools,
    stream=True
)

for event in stream:
    print(event.choices[0].delta.model_dump_json())
```

{% endtab %}
{% endtabs %}

Output events

```json
{"content":"","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":"I'll","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" check","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" the","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" current","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" weather","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" in","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" Paris","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" for","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":" you","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":".","function_call":null,"refusal":null,"role":"assistant","tool_calls":null}
{"content":"","function_call":null,"refusal":null,"role":"assistant","tool_calls":[{"index":0,"id":"functions.get_weather:0","function":{"arguments":"{\"location\": \", France\"}","name":"get_weather"},"type":"function"}]}
```

When the model calls one or more tools, an `event` will be emitted for each tool call where `tool_calls.type` is not empty:

```json
{
  "content": "",
  "role": "assistant",
  "tool_calls": [
    {
      "index": 0,
      "id": "get_weather:0",
      "function": { "arguments": "", "name": "get_weather" },
      "type": "function"
    }
  ]
}
```

Below is a snippet showing how to aggregate `delta` values into the final `tool_call` object.

Accumulate `tool_call` content

{% tabs %}
{% tab title="Python" %}

```python
final_tool_calls = {}

for event in stream:
    delta = event.choices[0].delta
    if delta.tool_calls and len(delta.tool_calls) > 0:
        tool_call = delta.tool_calls[0]
        if tool_call.type == "function":
            final_tool_calls[tool_call.index] = tool_call
        else:
            final_tool_calls[tool_call.index].function.arguments += tool_call.function.arguments

print("Final tool calls:")
for index, tool_call in final_tool_calls.items():
    print(f"Tool Call {index}:")
    print(tool_call.model_dump_json(indent=2))
```

{% endtab %}
{% endtabs %}

Accumulated `final_tool_calls[0]`

```json
{
  "index": 0,
  "id": "get_weather:0",
  "function": {
    "arguments": "{\"location\": \"Paris, France\"}",
    "name": "get_weather"
  },
  "type": "function"
}
```


# Prompt Caching

Prompt Cache in Infron

### What's Prompt Cache

Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt.&#x20;

*"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost.&#x20;

{% hint style="info" %}
Typically, cache read fees are about **10%-25%** of the original input cost, saving **up to 90%** of input costs.
{% endhint %}

### Best Practices for Prompt Cache <a href="#best-practices" id="best-practices"></a>

#### Maximizing Cache Hit Rate <a href="#maximizing-cache-hit-rate" id="maximizing-cache-hit-rate"></a>

Optimization Recommendations

* **Maintain Prefix Consistency**: Place static content at the beginning of prompts, variable content at the end
* **Use Breakpoints Wisely**: Set different cache breakpoints based on content update frequency
* **Avoid Minor Changes**: Ensure cached content remains completely consistent across multiple requests
* **Control Cache Time Window**: Initiate subsequent requests within 5 minutes to hit cache

**Extending Cache Time (1-hour TTL)**

If your request intervals may exceed 5 minutes, consider using 1-hour cache:

```json
{
    "type": "text",
    "text": "Long document content...",
    "cache_control": {
        "type": "ephemeral",
        "ttl": "1h" # Extend to 1 hour #
    }
}
```

The write cost for 1-hour cache is 2x the base fee (compared to 1.25x for 5-minute cache), only worthwhile in low-frequency but regular call scenarios.

#### Avoiding Common Pitfalls <a href="#avoiding-common-pitfalls" id="avoiding-common-pitfalls"></a>

Common Issues

1. **Cached Content Too Short**: Ensure cached content meets minimum token requirements
2. **Content Inconsistency**: Changes in JSON object key order will invalidate cache (certain languages like Go, Swift)
3. **Mixed Format Usage**: Using different formatting approaches for the same content
4. **Ignoring Cache Validity Period**: Cache becomes invalid after 5 minutes

### Provider Sticky Routing <a href="#provider-sticky-routing" id="provider-sticky-routing"></a>

When using caching (whether automatically in supported models, or via the `cache_control` property), Infron uses ***provider sticky routing*** to maximize cache hits — see [Provider Sticky Routing](#provider-sticky-routing) below for details.

Sticky Routing helps multi-turn AI applications keep related requests on the same healthy upstream provider for a short period of time.

This improves the chance that provider-side prompt caches remain warm across turns, which can reduce repeated input processing, lower latency variance, and improve cost efficiency for long-context conversations.

Sticky Routing is designed to work with [Infron's normal provider routing](/docs/routing-and-gateway/inference-provider-routing), retry, and fallback capabilities. It does not permanently bind traffic to one provider. If the previously used provider becomes unavailable or unsuitable for the current request, Infron automatically falls back to normal routing.

#### Why It Matters <a href="#why-it-matters" id="why-it-matters"></a>

Many AI applications send repeated or partially repeated context across a conversation:

* chat applications,
* agent workflows,
* game advisor sessions,
* coding assistants,
* RAG conversations,
* long-context document workflows,
* roleplay or companion apps.

Modern model providers often support prompt caching. However, provider-side cache entries are usually tied to the provider that served the previous request. If a conversation moves between providers on each turn, the repeated context may not benefit from an existing cache entry.

Sticky Routing reduces this provider drift by keeping related requests on the same healthy provider when possible.

#### Key Benefits <a href="#key-benefits" id="key-benefits"></a>

When repeated conversation turns reach the same provider, cached prompt prefixes are more likely to be reused. This is especially useful for long system prompts, character cards, game state, retrieved documents, and stable context blocks.

**Lower Latency Variance**

Cache reads can reduce the amount of repeated context the provider needs to process. This can improve time to first token and reduce latency spikes in multi-turn flows.

**More Consistent Model Behavior**

Even when the model name is the same, different providers may have slightly different serving behavior, latency profiles, or feature support. Keeping one conversation on the same healthy provider helps preserve behavior consistency within that session.

**Safe Fallback**

Sticky Routing does not sacrifice availability. If the sticky provider becomes unavailable, fails, or no longer matches the request requirements, Infron will route to another suitable provider.

#### How It Works <a href="#how-it-works" id="how-it-works"></a>

For eligible requests, Infron identifies a conversation and remembers the provider that successfully served it. Subsequent requests in the same conversation and model are preferentially routed to that same provider, as long as it is still healthy and eligible.

Sticky Routing is applied only when it is safe to do so:

* the previous provider is still available,
* the provider supports the requested model and features,
* the request does not explicitly override provider selection,
* normal routing health checks still pass.

If these conditions are not met, the request uses standard Infron routing.

#### Availability and Fallback Behavior <a href="#availability-and-fallback-behavior" id="availability-and-fallback-behavior"></a>

Sticky Routing is opportunistic, not mandatory.

Infron may skip Sticky Routing when:

* no prior provider is available for the session,
* the previous provider is temporarily unavailable,
* the previous provider does not support the requested parameters,
* the request explicitly specifies provider routing preferences,
* the conversation is not identifiable,
* fallback is needed for reliability.

When Sticky Routing is skipped, the request continues through normal Infron routing.

#### What Sticky Routing Does Not Guarantee <a href="#what-sticky-routing-does-not-guarantee" id="what-sticky-routing-does-not-guarantee"></a>

Sticky Routing improves the likelihood of cache reuse, but it does not guarantee a cache hit.

Cache hits still depend on:

* provider cache support,
* model cache support,
* prompt length requirements,
* cache TTL,
* stable prompt prefixes,
* provider-side cache behavior,
* whether the request reaches the same compatible provider.

Sticky Routing should be viewed as an optimization layer that improves cache continuity while preserving routing flexibility.

#### Interaction With Provider Selection <a href="#interaction-with-provider-selection" id="interaction-with-provider-selection"></a>

Explicit provider controls take priority over Sticky Routing.

If a request specifies provider routing options such as provider order, provider allowlist, provider blocklist, or provider sorting, Infron respects those settings first.

This means Sticky Routing will not override a customer's intentional provider choice.

#### Interaction With Prompt Caching <a href="#interaction-with-prompt-caching" id="interaction-with-prompt-caching"></a>

Sticky Routing works best when combined with prompt caching.

Prompt caching may be automatic for some providers and models. Other providers require explicit cache markers such as `cache_control`.

**Example: Claude Explicit Cache Control**

```json
{
  "model": "anthropic/claude-opus-4.8",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "Long stable world state, character profile, policy, or knowledge base...",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "1h"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": "Continue this session."
    }
  ]
}
```

**Example: Explicit Cache Breakpoint**

```json
{
  "model": "qwen/qwen3-coder-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Use the reference below when answering."
        },
        {
          "type": "text",
          "text": "Long stable reference content...",
          "cache_control": {
            "type": "ephemeral"
          }
        },
        {
          "type": "text",
          "text": "Summarize the implementation details."
        }
      ]
    }
  ]
}
```

#### Inspecting Cache Usage <a href="#inspecting-cache-usage" id="inspecting-cache-usage"></a>

Responses may include cache accounting in the `usage.prompt_tokens_details` object.

Example:

```
{
  "usage": {
    "prompt_tokens": 10339,
    "completion_tokens": 60,
    "total_tokens": 10399,
    "prompt_tokens_details": {
      "cached_tokens": 10318,
      "cache_write_tokens": 0
    }
  }
}
```

Important fields:

* `cached_tokens`: tokens read from cache. A value greater than zero indicates a cache hit.
* `cache_write_tokens`: tokens written to cache. This commonly appears when a new cache entry is created.

### Caching Types

Models supported by Infron offer two types of prompt caching mechanisms:

<table><thead><tr><th width="165.2176513671875">Caching Type</th><th width="559.982666015625">Usage Method</th></tr></thead><tbody><tr><td><strong>Implicit Caching</strong></td><td>No configuration needed, <code>automatically managed by model provider</code></td></tr><tr><td><strong>Explicit Caching</strong></td><td>Requires <code>cache_control</code> parameter</td></tr></tbody></table>

#### Implicit Caching <a href="#type-1-implicit-caching" id="type-1-implicit-caching"></a>

The following model providers provide implicit automatic prompt caching, requiring no special parameters in requests—the model automatically detects and caches reusable content.

| Model Provider | Official Documentation                                                              | Quick Start                                |
| -------------- | ----------------------------------------------------------------------------------- | ------------------------------------------ |
| **OpenAI**     | [Prompt Caching](https://platform.openai.com/docs/guides/prompt-caching)            | [#openai](#openai "mention")               |
| **DeepSeek**   | [Prompt Caching](https://api-docs.deepseek.com/guides/kv_cache)                     |                                            |
| **xAI**        | [Prompt Caching](https://docs.x.ai/docs/models#models-and-pricing)                  | [#grok](#grok "mention")                   |
| **Google**     | [Prompt Caching](https://ai.google.dev/gemini-api/docs/caching)                     | [#google-gemini](#google-gemini "mention") |
| **Alibaba**    | [Prompt Caching](https://www.alibabacloud.com/help/en/model-studio/context-cache)   |                                            |
| **MoonshotAI** | [Prompt Caching](https://platform.moonshot.ai/old/caching.en-US#request-parameters) |                                            |
| **Z.AI**       | [Prompt Caching](https://docs.z.ai/guides/capabilities/cache)                       |                                            |

💡 Optimization Recommendations

To maximize cache hit rate, follow these best practices:

1. **Static-to-Dynamic Ordering**: Place stable, reusable content (such as system instructions, few-shot examples, document context) at the beginning of the messages array
2. **Variable Content at End**: Place variable, request-specific content (such as current user question, dynamic data) at the end of the array
3. **Maintain Prefix Consistency**: Ensure cached content remains completely consistent across multiple requests (including spaces and punctuation)

#### Explicit Caching <a href="#type-2-explicit-caching" id="type-2-explicit-caching"></a>

Anthropic Claude and Qwen series models can explicitly specify caching strategies through specific parameters. This approach provides the finest control but requires developers to actively manage caching strategies.

| Model Provider       | Official Documentation                                                                                                      | Quick Start                                      |
| -------------------- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ |
| **Anthropic Claude** | [Prompt Caching](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)                                      | [#anthropic-claude](#anthropic-claude "mention") |
| **Google**           | [Context caching overview](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview) | [#google-gemini](#google-gemini "mention")       |

**Caching Working Principle**

When you send a request with `cache_control` markers:

1. The system checks if a reusable cache prefix exists
2. If a matching cache is found, cached content is used (reducing cost)
3. If no match is found, the complete prompt is processed and a new cache entry is created

Cached content includes the complete prefix in the request: `tools` → `system` → `messages` (in this order), up to where `cache_control` is marked.

**Automatic Prefix Check**

You only need to add a cache breakpoint at the end of static content, and the system will automatically check approximately the preceding 20 content blocks for reusable cache boundaries. If the prompt contains more than 20 content blocks, consider adding additional `cache_control` breakpoints to ensure all content can be cached.

### Getting Started <a href="#getting-started" id="getting-started"></a>

#### Anthropic Claude

**Minimum Cache Length**

Minimum cacheable token count for different models:

<table><thead><tr><th width="367.7137451171875">Model Series</th><th>Minimum Cache Tokens</th></tr></thead><tbody><tr><td>Claude Opus 4.1/4</td><td>1024 tokens</td></tr><tr><td>Claude Haiku 3.5</td><td>2048 tokens</td></tr><tr><td>Sonnet 4.5/4/3.7</td><td>1024 tokens</td></tr></tbody></table>

**Caching Price**

* **Cache writes**: charged at 1.25x the price of the original input pricing
* **Cache reads**: charged at 0.1x the price of the original input pricing

**Cache Breakpoint Count**

Prompt caching with Anthropic requires the use of `cache_control` breakpoints. There is a limit of `4 breakpoints`, and the cache will expire within `5 minutes`. Therefore, it is recommended to reserve the cache breakpoints for large bodies of text, such as character cards, CSV data, RAG data, book chapters, etc. And there is a minimum prompt size of `1024 tokens.`

[Click here to read more about Anthropic prompt caching and its limitation.](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)

The `cache_control` breakpoint can only be inserted into the text part of a multipart message. Prompts shorter than the minimum token count will not be cached even if marked with `cache_control`. Requests will be processed normally but no cache will be created.

**Cache Validity Period**

* **Default TTL**: 5 minutes
* **Extended TTL**: 1 hour (requires additional fee)

Cache automatically refreshes with each use at no additional cost.

**System message caching example:**

```json
{
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a historian studying the fall of the Roman Empire. You know the following book very well:"
        },
        {
          "type": "text",
          "text": "HUGE TEXT BODY",
          "cache_control": {
            "type": "ephemeral"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What triggered the collapse?"
        }
      ]
    }
  ]
}
```

**User message caching example:**

```json
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Given the book below:"
        },
        {
          "type": "text",
          "text": "HUGE TEXT BODY",
          "cache_control": {
            "type": "ephemeral"
          }
        },
        {
          "type": "text",
          "text": "Name all the characters in the above book"
        }
      ]
    }
  ]
}
```

**Basic Usage: Caching System Prompts**

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<API_KEY>",
)

# First request - create cache
response = client.chat.completions.create(
    model="claude-sonnet-4-5@20250929", 
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an AI assistant specializing in literary analysis. Your goal is to provide insightful commentary on themes, characters, and writing style.\n"
                },
                {
                    "type": "text",
                    "text": "<Complete content of Pride and Prejudice>",
                    "cache_control": {"type": "ephemeral"} 
                }
            ]
        },
        {
            "role": "user",
            "content": "Analyze the main themes of Pride and Prejudice."
        }
    ]
)

print(response.choices[0].message.content)

# Second request - cache hit
response = client.chat.completions.create(
    model="claude-sonnet-4-5@20250929",
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are an AI assistant specializing in literary analysis. Your goal is to provide insightful commentary on themes, characters, and writing style.\n"
                },
                {
                    "type": "text",
                    "text": "<Complete content of Pride and Prejudice>",
                    "cache_control": {"type": "ephemeral"} # Same content hits cache #
                }
            ]
        },
        {
            "role": "user",
            "content": "Who are the main characters in this book?" # Only question differs #
        }
    ]
)

print(response.choices[0].message.content)
```

{% endtab %}
{% endtabs %}

**Advanced Usage: Caching Tool Definitions**

When your application uses many tools, caching tool definitions can significantly reduce costs:

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<API_KEY>",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5@20250929",
    tools=[ 
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather for a specified location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and province, e.g. Beijing, Beijing"
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "Temperature unit"
                        }
                    },
                    "required": ["location"]
                }
            }
        },
        # Can define more tools...
        {
            "type": "function",
            "function": {
                "name": "get_time",
                "description": "Get current time for a specified timezone",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "timezone": {
                            "type": "string",
                            "description": "IANA timezone name, e.g. Asia/Shanghai"
                        }
                    },
                    "required": ["timezone"]
                }
            },
            "cache_control": {"type": "ephemeral"} # Mark cache on last tool #
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What's the current weather and time in Beijing?"
        }
    ]
)

print(response.choices[0].message)
```

{% endtab %}
{% endtabs %}

By adding a `cache_control` marker on the last tool definition, the system will automatically cache all tool definitions as a complete prefix.

**Advanced Usage: Caching Conversation History**

In long conversation scenarios, you can cache the entire conversation history:

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<API_KEY>",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5@20250929",
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "...long system prompt",
                    "cache_control": {"type": "ephemeral"} # Cache system prompt #
                }
            ]
        },
        # Previous conversation history
        {
            "role": "user",
            "content": "Hello, can you tell me more about the solar system?"
        },
        {
            "role": "assistant",
            "content": "Of course! The solar system is a collection of celestial bodies orbiting the sun. It consists of eight planets, numerous satellites, asteroids, comets and other celestial objects..."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Great."
                },
                {
                    "type": "text",
                    "text": "Tell me more about Mars.",
                    "cache_control": {"type": "ephemeral"} # Cache all conversation up to here #
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)
```

{% endtab %}
{% endtabs %}

By adding `cache_control` to the last message of each conversation round, the system will automatically find and use the longest matching prefix from previously cached content. Even if content was previously marked with `cache_control`, as long as it's used within 5 minutes, it will automatically hit the cache and refresh the validity period.

**Advanced Usage: Multi-Breakpoint Combination**

When you have multiple content segments with different update frequencies, you can use multiple cache breakpoints:

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<API_KEY>",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5@20250929",
    tools=[ 
        # Tool definitions (rarely change)
        {
            "type": "function",
            "function": {
                "name": "search_documents",
                "description": "Search knowledge base",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "Search query"}
                    },
                    "required": ["query"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "get_document",
                "description": "Retrieve document by ID",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "doc_id": {"type": "string", "description": "Document ID"}
                    },
                    "required": ["doc_id"]
                }
            },
            "cache_control": {"type": "ephemeral"} # Breakpoint 1: Tool definitions #
        }
    ],
    messages=[
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are a research assistant with access to a document knowledge base.\n\n# Instructions\n- Always search for relevant documents first\n- Provide citations...",
                    "cache_control": {"type": "ephemeral"} # Breakpoint 2: System instructions #
                },
                {
                    "type": "text",
                    "text": "# Knowledge Base Context\n\nHere are the relevant documents for this conversation:\n\n## Document 1: Solar System Overview\nThe solar system consists of the sun and all celestial bodies orbiting it...\n\n## Document 2: Planetary Characteristics\nEach planet has unique characteristics...",
                    "cache_control": {"type": "ephemeral"} # Breakpoint 3: RAG documents #
                }
            ]
        },
        {
            "role": "user",
            "content": "Can you search for information about Mars rovers?"
        },
        {
            "role": "assistant",
            "content": [
                {
                    "type": "tool_use",
                    "id": "tool_1",
                    "name": "search_documents",
                    "input": {"query": "Mars rovers"}
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": "tool_1",
                    "content": "Found 3 relevant documents..."
                }
            ]
        },
        {
            "role": "assistant",
            "content": "I found 3 relevant documents. Let me get more details from the Mars exploration document."
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Okay, please tell me specific information about the Perseverance rover.",
                    "cache_control": {"type": "ephemeral"} # Breakpoint 4: Conversation history #
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)
```

{% endtab %}
{% endtabs %}

Using multiple cache breakpoints allows content with different update frequencies to be cached independently:

* **Breakpoint 1**: Tool definitions (almost never change)
* **Breakpoint 2**: System instructions (rarely change)
* **Breakpoint 3**: RAG documents (may update daily)
* **Breakpoint 4**: Conversation history (changes every round)

When only the conversation history is updated, the cache for the first three breakpoints remains valid, maximizing cost savings.

**What Invalidates Cache**

The following operations will invalidate part or all of the cache:

<table><thead><tr><th>Changed Content</th><th width="122.83740234375">Tool Cache</th><th width="101.639404296875">System Cache</th><th width="100.879150390625">Message Cache</th><th>Impact Description</th></tr></thead><tbody><tr><td><strong>Tool Definitions</strong></td><td>✘</td><td>✘</td><td>✘</td><td>Modifying tool definitions invalidates entire cache</td></tr><tr><td><strong>System Prompt</strong></td><td>✓</td><td>✘</td><td>✘</td><td>Modifying system prompt invalidates system and message cache</td></tr><tr><td><strong>tool_choice Parameter</strong></td><td>✓</td><td>✓</td><td>✘</td><td>Only affects message cache</td></tr><tr><td><strong>Add/Remove Images</strong></td><td>✓</td><td>✓</td><td>✘</td><td>Only affects message cache</td></tr></tbody></table>

#### OpenAI

Caching price changes:

* **Cache writes**: no cost
* **Cache reads**: charged at `0.1x ~ 0.5x the price` of the original input pricing

[Click here to view OpenAI's cache pricing per model.](https://platform.openai.com/docs/pricing)

Prompt caching with OpenAI is automated and does not require any additional configuration. There is a minimum prompt size of `1024 tokens`.

[Click here to read more about OpenAI prompt caching and its limitation.](https://platform.openai.com/docs/guides/prompt-caching)

#### Grok

Caching price changes:

* **Cache writes**: no cost
* **Cache reads**: charged at 0.25x the price of the original input pricing

[Click here to view Grok's cache pricing per model.](https://docs.x.ai/docs/models#models-and-pricing)

Prompt caching with Grok is automated and does not require any additional configuration.

#### Google Gemini <a href="#google-gemini" id="google-gemini"></a>

**Implicit Caching**

Gemini 2.5 Pro and 2.5 Flash models now support **implicit caching**, providing automatic caching functionality similar to OpenAI’s automatic caching. Implicit caching works seamlessly — no manual setup or additional `cache_control` breakpoints required.

Pricing Changes:

* No cache write or storage costs.
* Cached tokens are charged at `0.1x the price` of original input token cost.

Note that the TTL is on average 3-5 minutes, but will vary. There is a minimum of 1028 tokens for Gemini 2.5 Flash, and 2048 tokens for Gemini 2.5 Pro for requests to be eligible for caching.

[Official announcement from Google](https://developers.googleblog.com/en/gemini-2-5-models-now-support-implicit-caching/)

{% hint style="info" %}
To maximize implicit cache hits, keep the initial portion of your message arrays consistent between requests. Push variations (such as user questions or dynamic context elements) toward the end of your prompt/requests.
{% endhint %}

**Explicit Caching**

Gemini caching in Infron requires you to insert `cache_control` breakpoints explicitly within message content, similar to Anthropic and Qwen. We recommend using caching primarily for large content pieces (such as CSV files, lengthy character cards, retrieval augmented generation (RAG) data, or extensive textual sources).

{% hint style="info" %}
There is not a limit on the number of `cache_control` breakpoints you can include in your request. Infron will use `only the last breakpoint` for Gemini caching. Including multiple breakpoints is safe and can help maintain compatibility with Anthropic, but only `the final one` will be used for Gemini.
{% endhint %}

**Cache Validity Period**

* **Default TTL**: 5 minutes
* **Extended TTL**: 1 hour (requires additional fee)

**Caching Working Principle**

When you send a request with `cache_control` markers:

1. The system checks if a reusable cache prefix exists
2. If a matching cache is found, cached content is used (reducing cost)
3. If no match is found, the complete prompt is processed and a new cache entry is created

Cached content includes the complete prefix in the request: `tools` → `system` → `messages` (in this order), up to where `cache_control` is marked.

**Examples**

**System Message Caching Example**

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer YOUR-API-KEY",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "google/gemini-2.5-flash", 
    "messages": [
        {
            "role": "system",
            "content": [
                {
                    "type": "text",
                    "text": "You are a historian studying the fall of the Roman Empire. Below is an extensive reference book:"
                },
                {
                    "type": "text",
                    "text": "HUGE TEXT BODY HERE",
                    "cache_control": {
                        "type": "ephemeral",
                        "ttl": 300
                    }
                }
            ]
        },{
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What triggered the collapse?"
                }
            ]
        }
    ],
    "provider": {
        "order": ["google-vertex"]
    },
    "max_tokens": 1024
  })
)
print(response.json())
```

{% endtab %}
{% endtabs %}

**User Message Caching Example**

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer YOUR-API-KEY",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "google/gemini-2.5-flash", 
    "messages": [
        {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Based on the book text below:"
            },
            {
                "type": "text",
                "text": "HUGE TEXT BODY HERE",
                "cache_control": {
                    "type": "ephemeral",
                    "ttl": 300
                }
            },
            {
                "type": "text",
                "text": "List all main characters mentioned in the text above."
            }
        ]
        }
    ],
    "provider": {
        "order": ["google-vertex"]
    },
    "max_tokens": 1024
  })
)
print(response.json())
```

{% endtab %}
{% endtabs %}

**History User Message Caching Example**

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json
import time


long_prompt = """
### Prompt Title:
**The Shattered Continent — A Comprehensive World‑Building and Narrative Instruction**

---

You are to imagine and describe, in vivid, cinematic, and intellectually coherent detail, a vast fictional world known as *Aelyndra*, a continent that was once united under luminous orders of scholars, mages, engineers, and philosophers, but is now fragmented by centuries of arcane wars, plagues, and ideological rifts. The purpose of this prompt is to generate an elaborate tapestry of interlocking stories, characters, cultures, technologies, and metaphysical mysteries. Every generated text based on this prompt should feel immersive, multi‑layered, and historically grounded within its own logic. The tone should balance grounded realism with mythic resonance, evoking both awe and melancholy.

Below are detailed aspects, lore structures, stylistic expectations, sensory directions, metaphysical principles, and narrative possibilities you should elaborate upon.

---

#### 1. **Historical Overview**

Describe a timeline spanning thousands of years, from the primordial formation of Aelyndra to its contemporary fractured age.
Include eras such as **The Genesis Fires**, when the first luminous beings descended and shaped the continents; **The Chain‑Forge Epoch**, when mortal civilizations learned to harness resonant metals that could channel thought; **The Concordant Millennia**, the golden age of united knowledge; and **The Sundering**, a cataclysmic fracturing that split both geography and the collective memory of humankind.

Every historical event must feel internally consistent: show cause and consequence. For instance, the loss of one coastal city’s library should have ripples across distant temples and later generations’ philosophies. The tone should be reflective and slightly tragic, as though the chronicler recounts a glorious but forgotten lineage.

---

#### 2. **Geography and Environment**

Construct a geography of striking variety and symbolic resonance — volcanic shores, glass deserts, cities built within petrified forests, islands that drift through the mist like sleeping giants. For each region, define climate, flora, fauna, and the materials used in architecture. The
**Amber Steppes**, for example, might shimmer with grasses that refract sunlight into living colors, while **The Hollow Expanse** could be a wasteland where the air hums with residual magic from ancient wars.

Integrate ecological logic: how trade winds, oceanic currents, or tectonic activity affect culture and migration. Mountains may separate kingdoms physically but rivers and undersea tunnels connect them secretly. Give attention to sensory cues — the smell of resin in mountain villages, the sound of iron insects ticking in the deserts at twilight, the taste of mineral dust in air after storms.

---

#### 3. **Peoples and Cultures**

List multiple civilizations and describe how they diverged culturally, linguistically, and spiritually after the Sundering. Avoid simplistic binaries of good versus evil; each culture must hold a mixture of beauty, cruelty, and contradiction.
For instance:

- **The Dathenians**, descendants of former astronomer‑priests, now live beneath great dome observatories shattered by meteor showers; their language evolved around the concept of cyclical silence, and their rituals involve rebuilding and unbuilding stone circles.
- **The Marquorians**, sea‑bound artisans who sculpt coral into living fortresses; they treat navigation as a spiritual rite, believing each voyage mirrors the journey of the soul beyond death.
- **The Oruvian Clans**, desert dwellers who master the remnants of sonic engineering, forging instruments that can blast sandstorms into harmonious patterns visible for miles.

Each description should anchor political systems, economic practices, mythological origins, and interpersonal customs: how they greet each other, mourn their dead, or repair their tools. Include the etymology of cultural names, food habits, clothing textures, and color symbolism.

---

#### 4. **Religions, Philosophy, and Magic Systems**

Magic in this world arises not from childish incantation but from **resonant cognition**, a symbiotic interaction between thought, mineral vibration, and light frequency. Those talented in the craft can bind emotion into material forms — forging “sentient metals” that remember their wielders’ fears or hopes. Magic is thus both scientific and spiritual, blurring boundaries between psychology, physics, and theology.

Develop diverse schools of philosophy debating ethical use of such power:

- The **Solace Theorists** argue that controlling resonance is an act of compassion — to heal broken matter.
- The **Iron Aesthetes** consider creation a cruel necessity, insisting that only destruction brings cosmic symmetry.
- The **Children of Echo** worship silence and claim that every magical act pollutes the universal rhythm.

In your generated text, treat these doctrines not merely as background flavor but as intellectual frameworks shaping language, law, art, and personal relationships.

---

#### 5. **Technology and Architecture**

Aelyndra’s civilizations developed hybrid science combining clockwork engineering, bio‑alchemy, and energy crystallization. Describe towers powered by luminous conduits that pulse in rhythm with heartbeat sensors, skyships navigated by harmonic crystals, temples where gears and vines intertwine as living mechanisms. Highlight how technology evolves according to resource distribution: coastal regions rely on fungal luminescence, while mountain regimes mine “thought‑ore.” The interplay of invention and superstition drives narrative tension: progress both liberates and curses.

Architectural imagery should emphasize scale and mood: narrow alleys carved into obsidian cliffs, floating monasteries tethered by cables of woven gold, and markets illuminated by singing light globes whose hum forms improvised melodies as people pass.

---

#### 6. **Narrative Archetypes**

Encourage stories about rediscovery, reconciliation, and ambiguity rather than simple triumph. Possible archetypes include:

- The **Historian Without Records**, traveling to piece together memories hidden in ruins.
- The **Exile Engineer**, carrying an artifact that generates voices of those it once killed.
- The **Dream Cartographer**, mapping emotions that alter geography in real time.
- The **Queen of Mirrors**, who governs through reflection because her actual body has dissolved into glass.

All characters must confront both external danger and metaphysical uncertainty. Their heroism is subtle — the courage to remember or forgive rather than to conquer.

---

#### 7. **Sensory and Emotional Atmosphere**

When generating scenes, prioritize evocative sensory layering:
- Sound: the low chime of suspended glass, whispering wind through broken halls, distant chanting over water.
- Sight: refracted twilight on metallic dunes, murals shimmering with bioluminescent ink.
- Texture: the contrast between rusted ruins and the softness of moss growing over them.
- Emotion: nostalgia, intellectual awe, gentle melancholy, quiet rebellion.

Narrative pacing should oscillate between stillness and momentum — slow revelation punctuated by flashes of insight or dread. Readers should feel as though they’re remembering a place they never visited.

---

#### 8. **Metaphysics and Ethics**

Articulate the metaphysical principle that the universe is a dialogue between **Memory** and **Entropy**. Every act of creation defies forgetting but accelerates decay elsewhere. As a result, civilizations in Aelyndra constantly face moral trade‑offs: Should they preserve ancient resonance‑engines at the cost of ecological balance, or let their light fade naturally? These philosophical dilemmas should infuse even ordinary conversations.

Include thought experiments, fragmentary proverbs, and paradoxical hymns: “What we rebuild, we erase differently.” Avoid clichés of prophecy; instead, show how destiny might itself be a side effect of collective guilt or yearning.

---

#### 9. **Language, Names, and Symbol Codes**

Build naming conventions that suggest linguistic diversity — alternating consonant clusters and harmonic vowels, or syntax where verbs precede emotion markers. Indicate how written language has evolved: maybe modern scribes use glowing ink, and every sentence emits faint music depending on its meaning. Each culture’s writing system reveals worldview: linear scripts for materialists, spiral glyphs for those who worship recursion.
Allow symbols like tri‑circles, mirrored sigils, or broken hexagrams to recur as motifs linking spirituality and mathematics.

---

#### 10. **Storytelling Mode and Style**

When generating prose or dialogue from this prompt:

- **Tone:** intellectual lyricism blended with tactile realism.
- **Point of View:** optional mixture of omniscient chronicler, first‑person witness, or mosaic of journal entries.
- **Pacing:** start with environment or philosophical reflection before advancing plot.
- **Voice:** maintain rich vocabulary and musical rhythm, avoiding modern slang.
- **Conflict Portrayal:** inner struggle takes precedence; physical battles should mirror psychological or ideological clashes.

Comparison points: the emotional gravity of high epic poetry, the forensic detail of travelogues, the mournful tone of lost civilizations.

---

#### 11. **Prompts for Expansion**

After establishing the world, encourage detailed responses to sub‑prompts such as:

1. Describe a festival in a ruined city rebuilt with living vines; include sensory details, songs, rituals, and philosophical conversations heard between drunk scholars.
2. Write letters exchanged between two philosophers debating whether machines can dream. Use subtle metaphors instead of direct exposition.
3. Paint a panoramic view of the continent from orbit after centuries of regrowth — show what remains luminous when human memory fades.
4. Chronicle a court where sentences are sung rather than spoken, and justice is determined by the harmony of the choir’s tone.
5. Depict children discovering an artifact that records emotions. Show how it alters their personal identities.

Each of these sub‑prompts must align with the metaphysical and cultural logic above.

---

#### 12. **Ethos of Generation**

When using this master prompt, emphasize imagination rooted in coherence. Every fantastical element should follow some rationale — whether physical, symbolic, or emotional. Avoid default tropes (knights, elves, dragons) unless reinvented with purpose. Portray diversity of belief and appearance; suggest realistic emotions amid mythic context. The world should feel *earned*, as though history genuinely unfolded there.

---

#### 13. **Purpose and Audience**

This is designed for creators seeking an inexhaustible setting for stories, poems, games, or conceptual art. It invites introspection, exploration of morality, and appreciation for transient beauty. Its ideal audience values depth over spectacle, meaning over mere ornament.

---

#### 14. **Instruction to the AI (if applicable)**

When generating content from this prompt, the AI should:

- Adopt a deliberate, reflective tone.
- Prioritize atmosphere and reasoning before action.
- Honor contradictions without resolving them.
- Provide continuity: refer back to established geography and philosophies.
- Avoid repetition, clichés, or superficial heroism.
- Strive for prose that reads like the memory of a dream encoded into scripture.

Output should feel semi‑academic yet emotionally resonant — a mixture of archived myth and eyewitness recollection.

Please limit the output content to within 32 characters.
---

### End of Prompt
""".strip()

messages_data = {
    "model": "google/gemini-2.5-flash", 
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Based on the book text below:"
                },
                {
                    "type": "text",
                    "text": f"{long_prompt}",
                    "cache_control": {
                        "type": "ephemeral"
                    }
                },
                {
                    "type": "text",
                    "text": "List all main characters mentioned in the text above."
                }
            ]
        },
    ],
    "max_tokens": 1024,
    "usage": {
        "include": True
    },
    "reasoning": {
        "effort": "none"
    }
}

def chat_with_explicit_caching():
    response = requests.post(
        url="https://llm.onerouter.pro/v1/chat/completions",
        headers={
            "Authorization": "Bearer YOUR-API-KEY",
            "Content-Type": "application/json"
        },
        data=json.dumps(messages_data)
    )

    res = response.json()
    print(res)
    return res


if __name__ == '__main__':
    # calculate the Input Price
    explicit_caching_res = chat_with_explicit_caching()

    # Add the completion into the messages
    messages_data['messages'].append(
        {
            "role": explicit_caching_res['choices'][0]['message']['role'],
            "content": [
                {
                    "type": "text",
                    "text": explicit_caching_res['choices'][0]['message']['content'],
                }
            ]
        }
    )

    # calculate the Cache Read price
    explicit_caching_res = chat_with_explicit_caching()
    prompt_cache_read_cost = explicit_caching_res['cost_details']['prompt_cache_read_cost']
    cached_tokens = explicit_caching_res['usage']['prompt_tokens_details']['cached_tokens']
    print(f"prompt_cache_read_cost = {prompt_cache_read_cost}")
    print(f"cached_tokens = {cached_tokens}")
    print(f"google/gemini-2.5-flash - Cache Read / M tokens =  {prompt_cache_read_cost / cached_tokens * 1000000}")

```

{% endtab %}
{% endtabs %}

Response example:

```
prompt_cache_read_cost = 6.117e-05
cached_tokens = 2039
google/gemini-2.5-flash - Cache Read / M tokens =  0.03
```

<figure><img src="/files/e6IDfUeHtFatLlNptG93" alt=""><figcaption></figcaption></figure>

**Best Practices**

Optimization Recommendations

* **Maintain Prefix Consistency**: Place static content at the beginning of prompts, variable content at the end
* **Avoid Minor Changes**: Ensure cached content remains completely consistent across multiple requests
* **Control Cache Time Window**: Initiate subsequent requests within 5 minutes to hit cache
* **Extending Cache Time (1-hour TTL):** If your request intervals may exceed 5 minutes, consider using 1-hour cache:

```json
{
    "type": "text",
    "text": "Long document content...",
    "cache_control": {
        "type": "ephemeral",
        "ttl": 3600 # Extend to 1 hour #
    }
}
```


# Multimodal Input

Send images, PDFs, and audio to Infron AI models

Infron AI supports multiple input modalities beyond text, allowing you to send images, PDFs, and audio files to compatible models through our unified API. This enables rich multimodal interactions for a wide variety of use cases.

## Supported Modalities

### Images

Send images to vision-capable models for analysis, description, OCR, and more. Infron AI supports multiple image formats and both URL-based and base64-encoded images.

[Learn more about image inputs →](/docs/features/multimodal-input/images-inputs)

### PDFs

Process PDF documents with any model on Infron AI.&#x20;

Learn more about PDF processing →

### Audio

Send audio files to speech-capable models for transcription, analysis, and processing.&#x20;

Learn more about audio inputs →

## Getting Started

All multimodal inputs use the same `/v1/chat/completions` endpoint with the `messages` parameter. Different content types are specified in the message content array:

* **Images**: Use `image_url` content type
* **PDFs**: Use `file` content type with PDF data
* **Audio**: Use `input_audio` content type

You can combine multiple modalities in a single request, and the number of files you can send varies by provider and model.

## Model Compatibility

{% hint style="info" %}
Not all models support every modality.&#x20;
{% endhint %}

* **Vision models**: Required for image processing
* **File-compatible models**: Can process PDFs natively or through our parsing system
* **Audio-capable models**: Required for audio input processing

Use our [Models page](https://app.onerouter.pro/models) to find models that support your desired input modalities.

## Input Format Support

Infron AI supports both **direct URLs** and **base64-encoded data** for multimodal inputs:

#### URLs (Recommended for public content)

* **Images**: `https://example.com/image.jpg`
* **PDFs**: `https://example.com/document.pdf`
* **Audio**: Not supported via URL (base64 only)

### Base64 Encoding (Required for local files)

* **Images**: `data:image/jpeg;base64,{base64_data}`
* **PDFs**: `data:application/pdf;base64,{base64_data}`
* **Audio**: Raw base64 string with format specification

URLs are more efficient for large files as they don't require local encoding and reduce request payload size.&#x20;

Base64 encoding is required for local files or when the content is not publicly accessible.


# Images Inputs

How to send images and PDFs to Infron AI

Infron AI supports sending images via the API. This guide will show you how to work with images file types using our API.

## Image Inputs

Requests with images, to multimodel models, are available via the`/v1/chat/completions`API with a multi-part `messages` parameter. The `image_url` can either be a URL or a base64-encoded image.&#x20;

Note that multiple images can be sent in separate content array entries. The number of images you can send in a single request varies per provider and per model. Due to how the content is parsed, we recommend sending the text prompt first, then the images. If the images must come first, we recommend putting it in the system prompt.

### Using Image URLs

Here's how to send an image using a URL:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            }
        ]
    }
]

payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json()["choices"][0]["message"]["content"])
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: "What's in this image?",
          },
          {
            type: 'image_url',
            image_url: {
              url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
            },
          },
        ],
      },
    ],
  }),
});

const data = await response.json();
console.log(data);
```

{% endtab %}
{% endtabs %}

### Using Base64 Encoded Images

For locally stored images, you can send them using base64 encoding. Here's how to do it:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json
import base64
from pathlib import Path

def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Read and encode the image
image_path = "path/to/your/image.jpg"
base64_image = encode_image_to_base64(image_path)
data_url = f"data:image/jpeg;base64,{base64_image}"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": data_url
                }
            }
        ]
    }
]

payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json()["choices"][0]["message"]["content"])
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
async function encodeImageToBase64(imagePath: string): Promise<string> {
  const imageBuffer = await fs.promises.readFile(imagePath);
  const base64Image = imageBuffer.toString('base64');
  return `data:image/jpeg;base64,${base64Image}`;
}

// Read and encode the image
const imagePath = 'path/to/your/image.jpg';
const base64Image = await encodeImageToBase64(imagePath);

const response = await fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${API_KEY_REF}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: "What's in this image?",
          },
          {
            type: 'image_url',
            image_url: {
              url: base64Image,
            },
          },
        ],
      },
    ],
  }),
});

const data = await response.json();
console.log(data);
```

{% endtab %}
{% endtabs %}

Supported image content types are:

* `image/png`
* `image/jpeg`
* `image/webp`


# PDF Inputs

How to send PDFs to Infron AI models

Infron AI supports PDF processing through the `/v1/chat/completions` API. PDFs can be sent as **direct URLs** or **base64-encoded data URLs** in the messages array, via the file content type. This feature works on **any** model on Infron AI.

* **URL support**: Send publicly accessible PDFs directly without downloading or encoding
* **Base64 support**: Required for local files or private documents that aren't publicly accessible

PDFs also work in the chat room for interactive testing.

You can send both PDFs and other file types in the same request.

### Using PDF URLs

For publicly accessible PDFs, you can send the URL directly without needing to download and encode the file:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What are the main points in this document?"
            },
            {
                "type": "file",
                "file": {
                    "filename": "document.pdf",
                    "file_data": "https://domain.org/file.pdf"
                }
            },
        ]
    }
]

payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: 'What are the main points in this document?',
          },
          {
            type: 'file',
            file: {
              filename: 'document.pdf',
              file_data: 'https://bitcoin.org/bitcoin.pdf',
            },
          },
        ],
      },
    ],

  }),
});

const data = await response.json();
console.log(data);
```

{% endtab %}
{% endtabs %}

### Using Base64 Encoded PDFs

For local PDF files or when you need to send PDF content directly, you can base64 encode the file:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json
import base64
from pathlib import Path

def encode_pdf_to_base64(pdf_path):
    with open(pdf_path, "rb") as pdf_file:
        return base64.b64encode(pdf_file.read()).decode('utf-8')

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Read and encode the PDF
pdf_path = "path/to/your/document.pdf"
base64_pdf = encode_pdf_to_base64(pdf_path)
data_url = f"data:application/pdf;base64,{base64_pdf}"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What are the main points in this document?"
            },
            {
                "type": "file",
                "file": {
                    "filename": "document.pdf",
                    "file_data": data_url
                }
            },
        ]
    }
]


payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
async function encodePDFToBase64(pdfPath: string): Promise<string> {
  const pdfBuffer = await fs.promises.readFile(pdfPath);
  const base64PDF = pdfBuffer.toString('base64');
  return `data:application/pdf;base64,${base64PDF}`;
}

// Read and encode the PDF
const pdfPath = 'path/to/your/document.pdf';
const base64PDF = await encodePDFToBase64(pdfPath);

const response = await fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    messages: [
      {
        role: 'user',
        content: [
          {
            type: 'text',
            text: 'What are the main points in this document?',
          },
          {
            type: 'file',
            file: {
              filename: 'document.pdf',
              file_data: base64PDF,
            },
          },
        ],
      },
    ],
 
  }),
});

const data = await response.json();
console.log(data);
```

{% endtab %}
{% endtabs %}


# Audio Inputs

How to send audio files to Infron AI models

Infron AI supports sending audio files to compatible models via the API. This guide will show you how to work with audio using our API.

{% hint style="info" %}
**Note**:  Audio files must be **base64-encoded** - direct URLs are not supported for audio content.
{% endhint %}

## Audio Inputs

Requests with audio files to compatible models are available via the `/v1/chat/completions` API with the `input_audio` content type. Audio files must be base64-encoded and include the format specification. Note that only models with audio processing capabilities will handle these requests.

#### Sending Audio Files

Here's how to send an audio file for processing:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json
import base64

def encode_audio_to_base64(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode('utf-8')

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Read and encode the audio file
audio_path = "path/to/your/audio.wav"
base64_audio = encode_audio_to_base64(audio_path)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Please transcribe this audio file."
            },
            {
                "type": "input_audio",
                "input_audio": {
                    "data": base64_audio,
                    "format": "wav"
                }
            }
        ]
    }
]

payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import fs from "fs/promises";

async function encodeAudioToBase64(audioPath: string): Promise<string> {
  const audioBuffer = await fs.readFile(audioPath);
  return audioBuffer.toString("base64");
}

// Read and encode the audio file
const audioPath = "path/to/your/audio.wav";
const base64Audio = await encodeAudioToBase64(audioPath);

const response = await fetch("https://llm.onerouter.pro/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "{{MODEL}}",
    messages: [
      {
        role: "user",
        content: [
          {
            type: "text",
            text: "Please transcribe this audio file.",
          },
          {
            type: "input_audio",
            input_audio: {
              data: base64Audio,
              format: "wav",
            },
          },
        ],
      },
    ],
  }),
});

const data = await response.json();
console.log(data);
```

{% endtab %}
{% endtabs %}

Supported audio formats are:

* `wav`
* `mp3`


# Video Inputs

Infron.AI supports sending video files to compatible models via the API. This guide will show you how to work with video using our API.

Infron.AI supports both **direct URLs** and **base64-encoded data URLs** for videos:

* **URLs**: Efficient for publicly accessible videos as they don't require local encoding
* **Base64 Data URLs**: Required for local files or private videos that aren't publicly accessible

{% hint style="info" %}
**Important**: Video URL support varies by provider. Infron.AI only sends video URLs to providers that explicitly support them.&#x20;

For example, Google Gemini on AI Studio only supports YouTube links (not Vertex AI).&#x20;
{% endhint %}

{% hint style="info" %}
**API Only**: Video inputs are currently only supported via the API.
{% endhint %}

### Video Inputs

Requests with video files to compatible models are available via the `/v1/chat/completions` API with the `video_url` content type. The `url` can either be a **URL** or a **base64-encoded data** URL. Note that only models with video processing capabilities will handle these requests.

#### Using Video URLs

Here's how to send a video using a URL. Note that for Google Gemini on AI Studio, only YouTube links are supported:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY_REF}",
    "Content-Type": "application/json"
}

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Please describe what's happening in this video."
            },
            {
                "type": "video_url",
                "video_url": {
                    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
                }
            }
        ]
    }
]

payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())
```

{% endtab %}
{% endtabs %}

#### Using Base64 Encoded Videos

For locally stored videos, you can send them using base64 encoding as data URLs:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json
import base64
from pathlib import Path

def encode_video_to_base64(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode('utf-8')

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {API_KEY_REF}",
    "Content-Type": "application/json"
}

# Read and encode the video
video_path = "path/to/your/video.mp4"
base64_video = encode_video_to_base64(video_path)
data_url = f"data:video/mp4;base64,{base64_video}"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this video?"
            },
            {
                "type": "video_url",
                "video_url": {
                    "url": data_url
                }
            }
        ]
    }
]

payload = {
    "model": "{{MODEL}}",
    "messages": messages
}

response = requests.post(url, headers=headers, json=payload)
print(response.json())
```

{% endtab %}
{% endtabs %}

### Supported Video Formats

Infron.AI supports the following video formats:

* `video/mp4`
* `video/mpeg`
* `video/mov`
* `video/webm`

### Common Use Cases

Video inputs enable a wide range of applications:

* **Video Summarization**: Generate text summaries of video content
* **Object and Activity Recognition**: Identify objects, people, and actions in videos
* **Scene Understanding**: Describe settings, environments, and contexts
* **Sports Analysis**: Analyze gameplay, movements, and tactics
* **Surveillance**: Monitor and analyze security footage
* **Educational Content**: Analyze instructional videos and provide insights

### Best Practices

#### File Size Considerations

Video files can be large, which affects both upload time and processing costs:

* **Compress videos** when possible to reduce file size without significant quality loss
* **Trim videos** to include only relevant segments
* **Consider resolution**: Lower resolutions (e.g., 720p vs 4K) reduce file size while maintaining usability for most analysis tasks
* **Frame rate**: Lower frame rates can reduce file size for videos where high temporal resolution isn't critical

#### Optimal Video Length

Different models may have different limits on video duration:

* Check model-specific documentation for maximum video length
* For long videos, consider splitting into shorter segments
* Focus on key moments rather than sending entire long-form content

#### Quality vs. Size Trade-offs

Balance video quality with practical considerations:

* **High quality** (1080p+, high bitrate): Best for detailed visual analysis, object detection, text recognition
* **Medium quality** (720p, moderate bitrate): Suitable for most general analysis tasks
* **Lower quality** (480p, lower bitrate): Acceptable for basic scene understanding and action recognition

### Provider-Specific Video URL Support

Video URL support varies significantly by provider:

* **Google Gemini (AI Studio)**: Only supports YouTube links (e.g., `https://www.youtube.com/watch?v=...`)
* **Google Gemini (Vertex AI)**: Does not support video URLs - use base64-encoded data URLs instead
* **Other providers**: Check model-specific documentation for video URL support


# Reasoning & Thinking

For models that support it, the Infron API can return **Reasoning Tokens**, also known as **Thinking Tokens**. Infron normalizes the different ways of customizing the amount of reasoning tokens that the model will use, providing a **unified reasoning & thinking interface** across different providers.

Reasoning tokens provide a transparent look into the reasoning steps taken by a model. Reasoning tokens are considered output tokens and charged accordingly.

Infron provides a unified parameter wrapper for Reasoning & Thinking.&#x20;

* **If the "reasoning" field is not included in the request**, Infron will keep this parameter in an "unset" state, that is, follow the default value of the provider origin site.&#x20;
* **If the "reasoning" field is included in the request**, Infron will uniformly perform parameter conversion to adapt to the Reasoning & Thinking parameter formats of different providers.

Reasoning tokens are included in the response by default if the model decides to output them. Reasoning tokens will appear in the reasoning field of each message.

### Controlling Reasoning Tokens in OpenAI Chat Completions

You can control reasoning tokens in your requests using the `reasoning` parameter:

```json
{
  "model": "your-model",
  "messages": [],
  "reasoning": {
    // One of the following (not both):
    "effort": "high", // Can be "xhigh", "high", "medium", "low", "minimal" or "none"
    "max_tokens": 2000, // Specific token limit
  }
}
```

The `reasoning` config object consolidates settings for controlling reasoning strength across different models.&#x20;

The `effort` can be one of below list:

* `"effort": "xhigh"` - Allocates the largest portion of tokens for reasoning (approximately 95% of max\_tokens)
* `"effort": "high"` - Allocates a large portion of tokens for reasoning (approximately 80% of max\_tokens)
* `"effort": "medium"` - Allocates a moderate portion of tokens (approximately 50% of max\_tokens)
* `"effort": "low"` - Allocates a smaller portion of tokens (approximately 20% of max\_tokens)
* `"effort": "minimal"` - Allocates an even smaller portion of tokens (approximately 10% of max\_tokens)
* `"effort": "none"` - Disables reasoning entirely

For models that only support `reasoning.max_tokens`, the effort level will be set based on the percentages above.

### Examples

#### Basic Usage with Reasoning Tokens

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<<API_KEY_REF>>",
)

response = client.chat.completions.create(
    model="<<MODEL>>",
    messages=[
        {"role": "user", "content": "How would you build the world's tallest skyscraper?"}
    ],
    extra_body={
        "reasoning": {
            "effort": "high",
            "max_tokens": 2000
        }
    },
)

print(response.model_dump_json())
```

{% endtab %}
{% endtabs %}

#### Using Max Tokens for Reasoning

You can specify the exact number of tokens to use for reasoning:

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<<API_KEY_REF>>",
)

response = client.chat.completions.create(
    model="<<MODEL>>",
    messages=[
        {"role": "user", "content": "How would you build the world's tallest skyscraper?"}
    ],
    extra_body={
        "reasoning": {
            "max_tokens": 200
        }
    },
)

print(response.model_dump_json())
```

{% endtab %}
{% endtabs %}

#### Disables reasoning entirely

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<<API_KEY_REF>>",
)

response = client.chat.completions.create(
    model="z-ai/glm-5",
    messages=[
        {"role": "user", "content": "How would you build the world's tallest skyscraper?"}
    ],
    extra_body={
        "reasoning": {
            "effort": "none"
        }
    },
)

print(response.model_dump_json())
```

{% endtab %}
{% endtabs %}

#### **Streaming mode with reasoning tokens**

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<<API_KEY_REF>>",
)

def chat_completion_with_reasoning(messages):
    response = client.chat.completions.create(
        model="<<MODEL>>",
        messages=messages,
        max_tokens=10000,
        extra_body={
            "reasoning": {
                "max_tokens": 8000,
                "effort": "high"
            }
        },
        stream=True
    )
    return response

for chunk in chat_completion_with_reasoning([
    {"role": "user", "content": "What's bigger, 9.9 or 9.11?"}
]):
    if hasattr(chunk.choices[0].delta, 'reasoning_details') and chunk.choices[0].delta.reasoning_details:
        print(f"REASONING_DETAILS: {chunk.choices[0].delta.reasoning_details}")
    elif getattr(chunk.choices[0].delta, 'content', None):
        print(f"CONTENT: {chunk.choices[0].delta.content}")
```

{% endtab %}
{% endtabs %}

### Responses API Shape

When reasoning models generate responses, the reasoning information is structured in a standardized format through the `reasoning_content` item.

```json
{
  "id": "chatcmpl-20251213055928959379058WiAGDQfk",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Building the world's tallest skyscraper is less an act of construction and more an exercise in extreme, multidisciplinary problem-solving. It requires pushing the boundaries of material science, structural engineering, logistical planning, and financial investment.\n\nTo build a structure exceeding 1,000 meters (1 kilometer), the process must be segmented into critical phases, each addressing unique challenges posed by extreme height.\n\n---\n\n## Phase 1: Concept, Planning, and Site Selection\n\nThe success of a gigatall project (over 600m) is determined long before the first shovel hits the ground.\n\n### 1. Strategic Location Analysis\nLocation is paramount, as it dictates the structural loads the building must withstand.\n*   **Geotechnical Stability:** The site must have strong, reliable bedrock or dense subsoil capable of bearing immense weight. Areas prone to liquefaction or shifting sands are non-starters.\n*   **Seismic Risk:** Highly active earthquake zones require significantly more expensive and complex structural reinforcement, potentially making record height unfeasible.\n*   **Wind Climate:** Coastal areas or flat, open plains experience harsher wind loads. The location's specific wind patterns (speed, direction, vortex frequency) must be modeled over decades.\n\n### 2. Architectural Design for Aerodynamics\nAt extreme heights, wind resistance is a far greater structural challenge than gravity. The design must actively mitigate lateral (side-to-side) forces.\n*   **Tapering and Setbacks:** The building should narrow or step back as it ascends (like the Burj Khalifa). This \"confuses\" the wind, preventing a single vortex from forming along the entire height and causing excessive oscillation.\n*   **Porosity:** Incorporating large openings or allowing air to pass through sections of the tower can reduce the pressure buildup on the façade.\n*   **Optimal Orientation:** Modeling is used to determine the exact orientation of the building relative to prevailing winds to minimize drag.\n\n### 3. Budget and Team Assembly\nA gigatall project typically costs tens of billions of dollars. Securing political will, massive state or sovereign funding, and assembling a global team of specialized engineers (often involving firms like Skidmore, Owings & Merrill or Adrian Smith + Gordon Gill Architecture) is the initial step.\n\n---\n\n## Phase 2: Geotechnical and Foundation Engineering\n\nA record-tall building must transfer loads of hundreds of thousands of tons safely to the earth.\n\n### 1. Site Investigation\nYears of intensive geotechnical drilling and testing are required to understand the soil profile, bedrock depth, and water table.\n\n### 2. The Foundation System\nThe most common solution for extreme height involves two main components:\n*   **Deep Piling:** Large-diameter reinforced concrete piles (often 1.5 to 2 meters wide) are driven or drilled deep into the stable substrate or bedrock. A structure of 1km might require hundreds of such piles, extending over 60–120 meters deep.\n*   **Mat (Raft) Foundation:** A massive, thick concrete slab—sometimes several meters thick—is poured directly over the top of the piles. This mat acts as a giant \"foot\" to distribute the weight evenly across the entire pile group, minimizing differential settlement.\n\n### 3. Super-Strength Concrete\nThe foundation requires some of the strongest concrete ever mixed (Ultra-High-Performance Concrete, or UHPC). High-density, high-compressive-strength concrete (often over 80 MPa) is used in the foundation and lower core, as it must bear the full cumulative weight of the structure above.\n\n---\n\n## Phase 3: Structural System (The Skeleton)\n\nThe structural concept must be incredibly stiff, stable, and lightweight at the top.\n\n### 1. The Core System: Bundled Tube or Outrigger\nThe most viable structural system for extreme height is based on the **bundled tube concept** or the **outrigger system**, designed to maximize lateral stiffness:\n*   **Central Core:** A massive, reinforced concrete central core houses the elevators and mechanical shafts. This core resists torsion (twisting).\n*   **Perimeter Columns/Wings:** Structural \"wings\" or columns radiate out from the core. These are connected by horizontal steel trusses (outriggers) at specific mechanical floors.\n*   **Function:** When wind pushes the building, the outriggers engage the perimeter columns, essentially using the entire building's mass to resist the force, like a giant, rigid truss structure.\n\n### 2. Material Strategy\nEngineers optimize materials based on their height and load requirements:\n*   **Lower Floors:** Extremely high-strength concrete is used up to perhaps 500 meters, where gravity loads are highest.\n*   **Upper Floors (The Spire):** As the gravity load decreases, structural steel is favored. Steel is lighter than concrete, reducing overall weight and allowing faster erection at the top.\n\n### 3. Dampening Systems\nEven with aerodynamic shaping, a kilometer-high building will sway. To ensure occupant comfort and structural integrity, sophisticated dampening systems are essential:\n*   **Tuned Mass Dampers (TMD):** Giant pendulum weights or hydraulic pistons installed at the highest occupied floors. These systems move in opposition to the building's sway, counteracting vibration caused by wind or minor seismic events.\n\n---\n\n## Phase 4: Construction, Logistics, and Vertical Transportation\n\nMoving people and materials thousands of feet high, safely and efficiently, is often the greatest logistical challenge.\n\n### 1. Self-Climbing Systems\nConventional cranes and scaffolding are inadequate. Specialized systems are required:\n*   **Self-Climbing Formwork:** The central concrete core is built using a hydraulic system that \"climbs\" the structure as the concrete cures. This speeds up construction and minimizes reliance on external cranes.\n*   **Super Cranes:** Extremely powerful, temporary tower cranes are custom-engineered to climb alongside the building. They lift materials like steel beams and façade panels up to the highest working floors. Once construction is complete, these cranes must be disassembled and lowered in pieces using smaller cranes in a complex sequence (the \"crane eating crane\" method).\n\n### 2. Vertical Transportation Logistics\nWorkers, concrete, and steel must reach the top floors quickly.\n*   **Construction Hoists:** High-speed temporary hoists, often external to the main structure, are dedicated solely to moving personnel and materials. These can move at speeds up to 20 meters per second.\n*   **Concrete Delivery:** Pumping concrete a kilometer high requires multiple stages of high-pressure pumps located at ground level and sometimes supplemented by powerful booster pumps located at intermediate \"sky lobbies.\"\n\n---\n\n## Phase 5: Façade, Systems, and Functionality\n\nThe final phase involves enclosing the structure and installing the complex mechanical systems.\n\n### 1. Curtain Wall and Cladding\nThe façade must be lightweight, capable of handling intense pressure differentials, and highly energy efficient.\n*   **Pressure Management:** As altitude increases, atmospheric pressure drops. The building enclosure must be sealed to maintain a standard internal pressure, while the outer cladding must be reinforced to handle the extreme positive and negative wind pressures.\n*   **Insulation:** Specialized, high-performance glass panels are used to manage solar gain, as temperatures near the top can be significantly cooler and sun exposure more intense.\n\n### 2. HVAC and Pressure Equalization\nStandard heating and cooling systems cannot operate effectively across 1km of altitude.\n*   **Zonal Systems:** The building is divided into multiple HVAC zones, typically separated at mechanical floors. This prevents the stack effect (the movement of air and smoke due to pressure differences) from overwhelming the system.\n*   **Air Intake:** Air intakes for the upper zones may be located hundreds of meters above ground to draw cleaner air.\n\n### 3. Elevators and Sky Lobbies\nA conventional elevator rope cannot reach 1,000 meters. The solution involves:\n*   **Sky Lobbies:** Express \"shuttle\" elevators run non-stop from the ground floor to intermediate sky lobbies (e.g., at floors 50, 100, and 150).\n*   **Local Elevators:** Shorter, conventional elevators operate from the sky lobbies to serve the floors immediately above and below them.\n*   **Innovation:** Future gigatall towers may utilize technologies like ropeless (magnetic levitation) elevators or high-strength carbon fiber ropes to allow for longer continuous runs.",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": null,
        "reasoning_content": "**Thinking Through Building the World's Tallest Skyscraper**\n\nOkay, so the prompt is \"How would you build the world's tallest skyscraper?\"  My immediate thought is that this isn't just a construction problem; it's a monumental engineering and logistical challenge, and I need to address it from a complete lifecycle perspective.  The \"tallest\" designation implies pushing boundaries, so I'll frame my answer around the existing record-holders, the Burj Khalifa and the planned Jeddah Tower.  I need to show understanding of the current state-of-the-art and then what *more* we would need to do.\n\nFirst, I need to structure my thoughts logically. A skyscraper project follows a natural progression, and I'll break it down that way: Concept & Planning, Geotechnical & Foundation, Structural Engineering, Construction & Logistics, and finally, Systems & Cladding.\n\n**Phase 1: Concept & Planning** is the foundation for everything else. This is where we make the *big* decisions: location (crucial for wind and seismic concerns), budget (gonna be *billions*), and assembling the *right* team. This isn't just about height; it has to be architecturally impressive and, importantly, *functional*.\n\n**Phase 2: Geotechnical & Foundation** is all about getting the load into the ground. We're talking about immense forces here. Extensive site testing is mandatory, followed by massive pile foundations – deep, large-diameter piles – or maybe even a truly substantial reinforced concrete raft, a mat foundation.  We're dealing with \"geotechnical\" and \"load bearing\" properties here.\n\n**Phase 3: Structural Engineering** is where the *real* challenges lie. Resisting gravity, wind, and seismic loads at this scale is no joke. The structure's skeleton has to be ultra-efficient, and the bundled tube or outrigger system, like the Burj Khalifa uses, is the proven method for extreme heights.  That means a strong central core and surrounding wings.\n\nWind is the enemy! Aerodynamic shaping is a necessity to mitigate the vortex effects at these altitudes. We're talking tapering, setbacks, even spiraling to confuse the wind. Stability means mass dampeners at the top to counteract sway and vibrations.  It's about comfort and safety. Materials? Ultra-high-strength concrete for the lower floors, steel for the upper floors (lighter).\n\n**Phase 4: Construction & Logistics**.  This is about speed, safety, and lifting materials thousands of feet in the air. We'll need self-climbing formwork, where the building *builds itself*, and super-cranes, likely mast-climbing hydraulic ones, that ascend with the structure.  Vertical transportation is another major hurdle – temporary express elevators/hoists. We need a *just-in-time* supply chain.\n\n**Phase 5: Systems & Cladding**.  Standard elevators just won't cut it.  We need multi-deck, express shuttles, and potentially sky lobbies to break up the long vertical journeys.  Maybe even something like the ropeless elevators, like ThyssenKrupp's MULTI concept.  Pumping water 1 km requires relay stations. HVAC needs pressure equalization and highly efficient, multi-zone systems because air pressure changes are drastic. Cladding has to be light, durable, and super insulated, especially at high altitudes.\n\nFinally, I'll need to emphasize that it all boils down to innovation, massive financial resources, and political will. The last spire on the structure may just be symbolic but critical to being the \"tallest\". It's the ultimate height differentiator.  Did I cover all the bases? Yes. Authoritative, engineering-focused – check!\n"
      }
    }
  ],
  "created": 1765605586,
  "model": "gemini-2.5-flash-preview-09-2025",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 2734,
    "prompt_tokens": 12,
    "total_tokens": 2746,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": null,
      "reasoning_tokens": 960,
      "rejected_prediction_tokens": null
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": null,
      "text_tokens": 12
    },
    "input_tokens": 0,
    "output_tokens": 0,
    "server_tool_use": {
      "web_search_requests": ""
    },
    "ttft": 0
  },
  "request_id": "ddfd035d55b34cb69e2a00cc325230a1"
}
```


# 1M Token Long Context Window

### Anthropic Claude: 1M token context

Anthropic’s Claude model family supports expanding the context window from the default 200K tokens to **1,000,000 tokens (1M)**—5× the default capacity.&#x20;

{% hint style="info" %}
Infron automatically enables the 1M token context window for Claude Sonnet 4/4.5/4.6 models. No configuration is required.
{% endhint %}

* Learn more: [Announcement](https://www.anthropic.com/news/1m-context), [Context windows docs](https://platform.claude.com/docs/en/build-with-claude/context-windows#1-m-token-context-window)
* Pricing: Requests that exceed 200K tokens are charged at premium rates. See [pricing details](https://docs.anthropic.com/en/about-claude/pricing#long-context-pricing).

With Infron, you can enable this capability effortlessly to handle large-scale document analysis, code review, long-running conversations, and more.

#### Supported Models

The following Claude models currently support the 1M-token context window:

| Model                 | Default Context Window | Extended Context Window |
| --------------------- | ---------------------- | ----------------------- |
| **Claude Opus 4.6**   | 200K tokens            | 1M tokens               |
| **Claude Sonnet 4.5** | 200K tokens            | 1M tokens               |
| **Claude Sonnet 4**   | 200K tokens            | 1M tokens               |

{% hint style="info" %}
The 1M-token context window is currently a Beta feature, and functionality and pricing may change in future versions.
{% endhint %}

#### Long-Context Pricing

When the number of tokens in a request exceeds 200K, long-context pricing will apply automatically. The multipliers are as follows:

| Pricing Item      | Up to 200K         | Over 200K                 |
| ----------------- | ------------------ | ------------------------- |
| **Input tokens**  | 1x (standard rate) | 2x (double rate)          |
| **Output tokens** | 1x (standard rate) | 1.5x (1.5× standard rate) |


# Web Search

This document explains how to use the Web Search feature on the Infron platform.

### Overview <a href="#overview" id="overview"></a>

Web Search allows an AI model to access real-time web information while generating an answer, enabling more accurate and up-to-date responses. This feature is particularly useful for:

* Querying breaking news and current events
* Getting the latest product information and pricing
* Looking up dynamic data such as weather and stock quotes
* Accessing the latest technical documentation and resources

### Supported Protocols <a href="#supported-protocols" id="supported-protocols"></a>

| Protocol                             | Endpoint               | Web Search Parameter                 |
| ------------------------------------ | ---------------------- | ------------------------------------ |
| Chat Completions (OpenAI-compatible) | `/v1/chat/completions` | `web_search_options`                 |
| Messages (Anthropic-compatible)      | `/v1/messages`         | `web_search_20250305` within `tools` |
| Responses (OpenAI Responses)         | `/v1/responses`        | `web_search` family within `tools`   |

### Web Search - Chat Completions API <a href="#id-1-chat-completions-api" id="id-1-chat-completions-api"></a>

The Chat Completions API enables Web Search via the `web_search_options` parameter.

#### Parameters <a href="#parameters" id="parameters"></a>

<table><thead><tr><th width="323.80206298828125">Parameter</th><th>Type</th><th>Required</th><th>Description</th></tr></thead><tbody><tr><td><code>web_search_options</code></td><td>object</td><td>No</td><td>Web search configuration</td></tr><tr><td><code>web_search_options.search_context_size</code></td><td>string</td><td>No</td><td><p>Search context size: </p><ul><li><code>low</code> </li><li><code>medium</code> </li><li><code>high</code></li></ul></td></tr><tr><td><code>web_search_options.user_location</code></td><td>object</td><td>No</td><td>User location info for localized search results</td></tr><tr><td><code>web_search_options.user_location.type</code></td><td>string</td><td>Yes</td><td>Location type, fixed as <code>approximate</code></td></tr><tr><td><code>web_search_options.user_location.city</code></td><td>string</td><td>No</td><td>City name</td></tr><tr><td><code>web_search_options.user_location.country</code></td><td>string</td><td>No</td><td>Country code (2-letter ISO, e.g. <code>CN</code>, <code>US</code>)</td></tr><tr><td><code>web_search_options.user_location.region</code></td><td>string</td><td>No</td><td>Region/province</td></tr><tr><td><code>web_search_options.user_location.timezone</code></td><td>string</td><td>No</td><td>Timezone (IANA format, e.g. <code>Asia/Shanghai</code>)</td></tr></tbody></table>

#### Example <a href="#example" id="example"></a>

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://llm.onerouter.pro/v1"
)

response = client.chat.completions.create(
    model="anthropic/claude-opus-4.6",
    messages=[
        {
            "role": "user",
            "content": "How is the weather in Beijing today?"
        }
    ],
    extra_body={
        "web_search_options": {
            "search_context_size": "high",
            "user_location": {
                "approximate": {
                    "timezone": "Asia/Shanghai",
                    "country": "CN",
                    "city": "Beijing"
                }
            }
        }
    }
)

print(response.json())
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl -X POST "https://llm.onerouter.pro/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "anthropic/claude-opus-4.6",
    "messages": [
      {
        "role": "user",
        "content": "How is the weather in Beijing today?"
      }
    ],
    "web_search_options": {
            "search_context_size": "high",
            "user_location": {
                "approximate": {
                    "timezone": "Asia/Shanghai",
                    "country": "CN",
                    "city": "Beijing"
                }
            }
    }
  }'
```

{% endtab %}
{% endtabs %}

Response example,

```json
{
  "id": "msg_016gnzBRiQ1zuyGAdfMZFRaY",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": ".\n\nOverall, it's a cool and mostly pleasant early spring day in Beijing. A jacket or sweater is recommended, especially for the chilly morning and evening hours. Enjoy the sunshine! ☀️",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1773060007,
  "model": "anthropic/claude-opus-4.6",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 404,
    "prompt_tokens": 12095,
    "total_tokens": 12499,
    "completion_tokens_details": null,
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": null
    },
    "input_tokens": 0,
    "output_tokens": 0,
    "ttft": 0,
    "server_tool_use": {
      "web_search_requests": ""
    }
  },
  "request_id": "e42a26abb0f047339a5dc1081bf9e277"
}
```


# Plugins

Extend model capabilities with Infron plugins

Infron plugins extend the capabilities of any model by adding features like real-time web search.


# Web Search

Model-agnostic grounding

### Overview <a href="#overview" id="overview"></a>

Web Search allows an AI model to access real-time web information while generating an answer, enabling more accurate and up-to-date responses. This feature is particularly useful for:

* Querying breaking news and current events
* Getting the latest product information and pricing
* Looking up dynamic data such as weather and stock quotes
* Accessing the latest technical documentation and resources


# Overview

Using Infron AI with Popular Frameworks and Integrations

Infron AI integrates seamlessly with popular AI frameworks and SDKs. Choose your preferred framework below for detailed integration guides:

### Available Framework Integrations <a href="#available-framework-integrations" id="available-framework-integrations"></a>

* [**Langfuse**](/docs/frameworks-and-integrations/langfuse) - Utilize Langfuse's native integration with the OpenAI SDK to automatically trace and monitor your Infron AI API calls.
* [**LangChain**](/docs/frameworks-and-integrations/langchain) - Integration with LangChain for Python and JavaScript applications
* [**OpenAI SDK**](/docs/frameworks-and-integrations/openai-sdk) - Direct integration using the official OpenAI SDK for Python and TypeScript
* [**PydanticAI**](/docs/frameworks-and-integrations/pydanticai) - High-level interface for Python applications using PydanticAI


# OpenAI SDK

Using Infron AI with OpenAI SDK

## Using the OpenAI SDK

* Using `pip install openai`
* Using `npm i openai`

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="{{MODEL}}",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)

print(completion.choices[0].message.content)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: '{{MODEL}}',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });

  console.log(completion.choices[0].message);
}

main();
```

{% endtab %}
{% endtabs %}


# LangChain

Using Infron AI with LangChain

## Using LangChain

* Using [LangChain for Python](https://github.com/langchain-ai/langchain)
* Using [LangChain.js](https://github.com/langchain-ai/langchainjs)
* Using [Streamlit](https://streamlit.io/)

{% tabs %}
{% tab title="Python" %}

```python
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from os import getenv
from dotenv import load_dotenv

load_dotenv()

template = """Question: {question}
Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

llm = ChatOpenAI(
  openai_api_key="API_KEY",
  openai_api_base="https://llm.onerouter.pro/v1",
  model_name="<model_name>",
  model_kwargs={
  },
)

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"

print(llm_chain.run(question))
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
const chat = new ChatOpenAI(
  {
    modelName: '<model_name>',
    temperature: 0.8,
    streaming: true,
    openAIApiKey: '${API_KEY}',
  },
  {
    basePath: 'https://llm.onerouter.pro/v1',
    baseOptions: {
    },
  },
);
```

{% endtab %}
{% endtabs %}


# PydanticAI

Using Infron AI with PydanticAI

## Using PydanticAI

[PydanticAI](https://github.com/pydantic/pydantic-ai) provides a high-level interface for working with various LLM providers, including Infron AI.

### Installation

```bash
pip install 'pydantic-ai-slim[openai]'
```

### Configuration

You can use Infron AI with PydanticAI through its OpenAI-compatible interface:

```python
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel

model = OpenAIModel(
    "claude-3-5-sonnet@20240620",  # or any other Infron AI model
    base_url="https://llm.onerouter.pro/v1",
    api_key="API_KEY",
)

agent = Agent(model)
result = await agent.run("What is the meaning of life?")
print(result)
```

For more details about using PydanticAI with Infron AI, see the [PydanticAI documentation](https://ai.pydantic.dev/models/#api_key-argument).


# Langfuse

Using Infron AI with Langfuse

[Langfuse](https://langfuse.com/) provides observability and analytics for LLM applications. Since Infron AI uses the OpenAI API schema, you can utilize Langfuse's native integration with the OpenAI SDK to automatically trace and monitor your Infron AI API calls.

## Installation

```bash
pip install langfuse openai
```

## Configuration

Set up your environment variables:

```python
import os

# Set your Langfuse API keys
LANGFUSE_SECRET_KEY="sk-lf-..."
LANGFUSE_PUBLIC_KEY="pk-lf-..."
# EU region
LANGFUSE_HOST="https://cloud.langfuse.com"
# US region
# LANGFUSE_HOST="https://us.cloud.langfuse.com"

# Set your Infron AI API key
os.environ["OPENAI_API_KEY"] = "${API_KEY}"
```

## Simple LLM Call

Since Infron AI provides an OpenAI-compatible API, you can use the Langfuse OpenAI SDK wrapper to automatically log Infron AI calls as generations in Langfuse:

```python
# Import the Langfuse OpenAI SDK wrapper
from langfuse.openai import openai

# Create an OpenAI client with Infron AI's base URL
client = openai.OpenAI(
    base_url="https://llm.onerouter.pro/v1"
)

# Make a chat completion request
response = client.chat.completions.create(
    model="{{MODEL}}",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a fun fact about space."}
    ],
    name="fun-fact-request"  # Optional: Name of the generation in Langfuse
)

# Print the assistant's reply
print(response.choices[0].message.content)
```

## Advanced Tracing with Nested Calls

Use the `@observe()` decorator to capture execution details of functions with nested LLM calls:

```python
from langfuse import observe
from langfuse.openai import openai

# Create an OpenAI client with Infron AI's base URL
client = openai.OpenAI(
    base_url="https://llm.onerouter.pro/v1",
)

@observe()  # This decorator enables tracing of the function
def analyze_text(text: str):
    # First LLM call: Summarize the text
    summary_response = summarize_text(text)
    summary = summary_response.choices[0].message.content

    # Second LLM call: Analyze the sentiment of the summary
    sentiment_response = analyze_sentiment(summary)
    sentiment = sentiment_response.choices[0].message.content

    return {
        "summary": summary,
        "sentiment": sentiment
    }

@observe()  # Nested function to be traced
def summarize_text(text: str):
    return client.chat.completions.create(
        model="{{MODEL}}",
        messages=[
            {"role": "system", "content": "You summarize texts in a concise manner."},
            {"role": "user", "content": f"Summarize the following text:\n{text}"}
        ],
        name="summarize-text"
    )

@observe()  # Nested function to be traced
def analyze_sentiment(summary: str):
    return client.chat.completions.create(
        model="{{MODEL}}",
        messages=[
            {"role": "system", "content": "You analyze the sentiment of texts."},
            {"role": "user", "content": f"Analyze the sentiment of the following summary:\n{summary}"}
        ],
        name="analyze-sentiment"
    )

# Example usage
text_to_analyze = "OneRouter's unified API has significantly advanced the field of AI development, setting new standards for model accessibility."
result = analyze_text(text_to_analyze)
print(result)
```


# n8n

Build AI automations with Infron AI & n8n

With Infron AI you have access to over 100+ AI models through one API, and with n8n you can connect to 8000+ apps to automate workflows, no coding required!

Combine Infron AI’s model routing with n8n’s integrations to automate tasks across CRMs, spreadsheets, messaging, and more.

## Set up your Integration <a href="#set-up-your-integration" id="set-up-your-integration"></a>

Get started by exploring available automations and creating your first n8n with Infron AI. The integration supports all Infron AI models and features, including streaming responses, function calling, and multimodal capabilities.

<figure><img src="/files/rcNjhMgZ0DeVB0SbDdZ5" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/iVvl3nfdRjr6PPtMCwOA" alt=""><figcaption></figcaption></figure>

### Using Infron AI in n8n <a href="#using-openrouter-in-zapier" id="using-openrouter-in-zapier"></a>

Once you’ve set up the integration, you can use Infron AI in your n8n to:

* **Generate content** with models like GPT-4, Claude, or Gemini
* **Analyze data** using specialized models for different domains
* **Process images** with vision-capable models
* **Create structured outputs** with JSON mode and function calling
* **Stream responses** for real-time applications

The Infron AI n8n integration automatically handles authentication, model routing, and error handling, so you can focus on building your automation logic.


# Claude Code Integration Guide

Use Claude Agent SDK and Claude Code with Infron AI models

### What's Claude Code

[Claude Code](https://www.claude.com/product/claude-code) is an AI-powered coding assistance published by Anthropic that provides a terminal interface, allowing developers to delegate complex programming tasks directly from the terminal to Claude Code for completion.

### **What We Help You Achieve**

With Infron, developers can seamlessly integrate and utilize a [wide range of AI model](https://infron.ai/models) combinations. Through **Infron**, you can use **Claude code** powered by different underlying model engines, including:

* **Gemini 3 models** as the core runtime for executing Claude code;
* **GPT series models** as the core runtime for executing Claude code;
* The **full series of Claude models** as the native core for executing Claude code.

This flexible architecture allows Infron to unify multiple large language model ecosystems under a single interface, enabling developers to switch, combine, or optimize model performance for diverse code execution and reasoning scenarios.

### Quick Start <a href="#quick-start" id="quick-start"></a>

Now, Infron provides [Anthropic SDK compatible LLM API services](/docs/frameworks-and-integrations/anthropic-sdk-compatibility), enabling you to easily use Infron AI LLM models in Claude Code to complete tasks. Please refer to the guide below to complete the integration process.

#### 1. Setup your Infron account & api keys <a href="#id-1-install-claude-code" id="id-1-install-claude-code"></a>

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

#### 2. Install Claude Code <a href="#id-1-install-claude-code" id="id-1-install-claude-code"></a>

{% hint style="info" %}
Before installing Claude Code, please ensure your local environment has [Node.js 18 or higher](https://nodejs.org/en/download/) installed.
{% endhint %}

To install Claude Code, run the following command:

{% tabs %}
{% tab title="NPM" %}

```sh
npm install -g @anthropic-ai/claude-code
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl -fsSL https://claude.ai/install.sh | bash
```

{% endtab %}
{% endtabs %}

#### 3. Setup the Claude Code configuration <a href="#id-2-start-your-first-session" id="id-2-start-your-first-session"></a>

Open the terminal and set up environment variables as follows:

{% tabs %}
{% tab title="Claude as Model" %}

```sh
# Set the Anthropic SDK compatible API endpoint provided by Infron.
export ANTHROPIC_BASE_URL="https://llm.onerouter.pro"
export ANTHROPIC_AUTH_TOKEN="<YOUR-API-Key-IN-INFRON>"

# Set the model provided by Infron.
export ANTHROPIC_MODEL="anthropic/claude-opus-4.7"
export ANTHROPIC_SMALL_FAST_MODEL="anthropic/claude-haiku-4.5"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="anthropic/claude-haiku-4.5"
export ANTHROPIC_DEFAULT_OPUS_MODEL="anthropic/claude-opus-4.7"
export ANTHROPIC_DEFAULT_SONNET_MODEL="anthropic/claude-sonnet-4.5"
```

{% endtab %}

{% tab title="Gemini as Model" %}

```sh
export ANTHROPIC_BASE_URL="https://llm.onerouter.pro"
export ANTHROPIC_AUTH_TOKEN="<YOUR-API-Key-IN-INFRON>"
export ANTHROPIC_MODEL="google/gemini-3-flash-preview"
export ANTHROPIC_SMALL_FAST_MODEL="google/gemini-3-flash-preview"
export ANTHROPIC_DEFAULT_HAIKU_MODEL="google/gemini-3-flash-preview"
export ANTHROPIC_DEFAULT_OPUS_MODEL="google/gemini-3-flash-preview"
export ANTHROPIC_DEFAULT_SONNET_MODEL="google/gemini-3-flash-preview"
```

{% endtab %}
{% endtabs %}

#### 4. Start your first session <a href="#id-2-start-your-first-session" id="id-2-start-your-first-session"></a>

Next, navigate to your project directory and start Claude Code. You will see the Claude Code prompt inside a new interactive session:

```sh
cd <your-project-directory>
claude .
```

<figure><img src="/files/fkBia43eSkDhFYDnk81F" alt=""><figcaption></figcaption></figure>

### Common Commands <a href="#common-commands" id="common-commands"></a>

| Command                     | Description                       | Example                             |
| --------------------------- | --------------------------------- | ----------------------------------- |
| `claude`                    | Start interactive mode            | `claude`                            |
| `claude "task description"` | Run a one-time task               | `claude "fix the build error"`      |
| `claude -p "query"`         | Run one-off query, then exit      | `claude -p "explain this function"` |
| `claude -c`                 | Continue most recent conversation | `claude -c`                         |
| `claude -r`                 | Resume a previous conversation    | `claude -r`                         |
| `claude commit`             | Create a Git commit               | `claude commit`                     |
| `/clear`                    | Clear conversation history        | `> /clear`                          |
| `/help`                     | View available commands           | `> /help`                           |
| `exit` or Ctrl+C            | Exit Claude Code                  | `> exit`                            |


# Anthropic SDK Compatibility

Use Anthropic SDK with Infron AI models

Infron AI AI provides a compatibility API that allows you to use the Anthropic SDK with Infron AI AI models. This is useful if you are already using the Anthropic SDK and want to switch to Infron AI AI models.

## Quick Start Guide <a href="#quick-start-guide" id="quick-start-guide"></a>

This guide demonstrates how to use the Anthropic SDK with the Infron AI AI models step by step.

### 1. Install the Anthropic SDK <a href="#id-1-install-the-anthropic-sdk" id="id-1-install-the-anthropic-sdk"></a>

```sh
pip install anthropic
```

### 2. Initialize the Client <a href="#id-2-initialize-the-client" id="id-2-initialize-the-client"></a>

The Anthropic SDKs are designed to pull the API key and base URL from the environmental variables: `ANTHROPIC_API_KEY` and `ANTHROPIC_BASE_URL`. Also, you can supply the parameters to the Anthropic client when initializing it.

* Using Environment Variables

```sh
export ANTHROPIC_BASE_URL="https://llm.onerouter.pro"
export ANTHROPIC_API_KEY="<<Your API Key>>"
```

* Set the parameters while initializing the Anthropic client

{% tabs %}
{% tab title="Python" %}

```python
import anthropic

client = anthropic.Anthropic(
    base_url="https://llm.onerouter.pro",
    api_key="<<Your API Key>>"
)

message = client.messages.create(
    model="anthropic/claude-sonnet-4.5",
    max_tokens=1000,
    temperature=1,
    system=[
        {
            "type": "text",
            "text": "You are a world-class poet. Respond only with short poems."
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Why is the ocean salty?"
                }
            ]
        }
    ]
)

print(message.content)
```

{% endtab %}
{% endtabs %}


# OpenAI Codex CLI

Use Codex CLI with Infron AI models

[Codex CLI](https://openai.com/codex/) is a terminal-based coding agent that combines local execution with cloud AI capabilities. Unlike code generation tools that only produce code snippets, Codex CLI can understand your entire project, execute the code it creates, debug issues, and iterate until solutions work correctly.

## How to Access Infron AI <a href="#h-how-to-access-novita-ai-models-in-codex-cli" id="h-how-to-access-novita-ai-models-in-codex-cli"></a>

### **Installation**

**Install via npm (Recommended)**

`npm install -g @openai/codex`

**Install via Homebrew (macOS)**

`brew install codex`

**Verify Installation**

`codex --version`

### Configuring Infron AI AI Models <a href="#h-configuring-novita-ai-models" id="h-configuring-novita-ai-models"></a>

**Setup Configuration File**

Codex CLI uses a TOML configuration file located at:

* **macOS/Linux**: `~/.codex/config.toml`
* **Windows**: `%USERPROFILE%\.codex\config.toml`

**Basic Configuration Template**

{% tabs %}
{% tab title="vim \~/.codex/config.toml" %}

```toml
model = "deepseek/deepseek-v4-pro"
model_provider = "Infron"

[model_providers.Infron]
name = "Infron"
base_url = "https://llm.onerouter.pro/v1"
http_headers = {"Authorization" = "Bearer YOUR_API_KEY"}
wire_api = "responses"
```

{% endtab %}
{% endtabs %}

## Getting Started

**Launch Codex CLI**

{% tabs %}
{% tab title="Bash" %}

```bash
codex
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/9zCYy68HKw5xtCNkSBtM" alt=""><figcaption></figcaption></figure>

**Basic Usage Examples**

**Code Generation**:

`> Create a Python class for handling REST API responses with error handling`

![](/files/P6VcmpkJ0P3vHmFLXdIf)

**Project Analysis**:

`> Review this codebase and suggest improvements for performance`

**Bug Fixing**:

`> Fix the authentication error in the login function`

### Conclusion <a href="#h-conclusion" id="h-conclusion"></a>

Codex CLI with Infron‘s models provides a powerful, flexible development environment that combines local control with cloud AI capabilities. By choosing the right model for each task and configuring your environment properly, you can significantly accelerate your development workflow while maintaining code quality and security.


# OpenAI Agents SDK

Use OpenAI Agents SDK with Infron AI models

Seamlessly integrate Infron AI AI with OpenAI Agents SDK for building multi-agent workflows.

The [OpenAI Agents SDK](https://github.com/openai/openai-agents-python) is a lightweight yet powerful framework for building multi-agent workflows. And the SDK is compatible with any model providers that support the OpenAI Chat Completions API format.

This guide will walk you through how to use Infron AI LLM API with OpenAI Agents SDK.

### Get Started <a href="#get-started" id="get-started"></a>

1. Set up your Python environment and install the Agents SDK.

```sh
python -m venv env
source env/bin/activate
pip install openai-agents==0.0.7
```

2. Set up your Infron AI API key.

Using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

### Hello world example <a href="#hello-world-example" id="hello-world-example"></a>

```python
import os
from openai import AsyncOpenAI
from agents import (
    Agent,
    Runner,
    set_default_openai_api,
    set_default_openai_client,
    set_tracing_disabled,
)

BASE_URL = "https://llm.onerouter.pro/v1"
API_KEY = "Your_API_KEY"
MODEL_NAME = "openai/gpt-5.4"

set_default_openai_api("chat_completions")
set_default_openai_client(AsyncOpenAI(base_url=BASE_URL, api_key=API_KEY))
set_tracing_disabled(disabled=True)

agent = Agent(name="Assistant",
              instructions="You are a helpful assistant", model=MODEL_NAME)

result = Runner.run_sync(
    agent, "Write a haiku about recursion in programming. step by step.")
print(result.final_output)

# Code within the code,
# Functions calling themselves,
# Infinite loop's dance.
```

### Handoffs example <a href="#handoffs-example" id="handoffs-example"></a>

```python
import os
import asyncio
from openai import AsyncOpenAI
from agents import (
    Agent,
    Runner,
    set_default_openai_api,
    set_default_openai_client,
    set_tracing_disabled,
)

BASE_URL = "https://llm.onerouter.pro/v1"
API_KEY = "Your_API_KEYS"
MODEL_NAME = "openai/gpt-5.4"

set_default_openai_api("chat_completions")
set_default_openai_client(AsyncOpenAI(base_url=BASE_URL, api_key=API_KEY))
set_tracing_disabled(disabled=True)

spanish_agent = Agent(
    name="Spanish agent",
    instructions="You only speak Spanish.",
    model=MODEL_NAME,
)

english_agent = Agent(
    name="English agent",
    instructions="You only speak English",
    model=MODEL_NAME,
)

triage_agent = Agent(
    name="Triage agent",
    instructions="Handoff to the appropriate agent based on the language of the request.",
    handoffs=[spanish_agent, english_agent],
    model=MODEL_NAME,
)


async def main():
    result = await Runner.run(triage_agent, input="Write a haiku about recursion in programming. step by step.")
    print(result.final_output)


if __name__ == "__main__":
    asyncio.run(main())
```

### Functions example <a href="#functions-example" id="functions-example"></a>

```python
import os
import asyncio
from openai import AsyncOpenAI
from agents import (
    Agent,
    Runner,
    set_default_openai_api,
    set_default_openai_client,
    set_tracing_disabled,
    function_tool,
)

BASE_URL = "https://llm.onerouter.pro/v1"
API_KEY = "Your_API_KEYS"
MODEL_NAME = "openai/gpt-5.4"

set_default_openai_api("chat_completions")
set_default_openai_client(AsyncOpenAI(base_url=BASE_URL, api_key=API_KEY))
set_tracing_disabled(disabled=True)

@function_tool
def get_weather(city: str) -> str:
    return f"The weather in {city} is sunny."

agent = Agent(
    name="Hello world",
    instructions="You are a helpful agent.",
    tools=[get_weather],
    model=MODEL_NAME,
)

async def main():
    result = await Runner.run(agent, input="What's the weather in Tokyo?")
    print(result.final_output)

if __name__ == "__main__":
    asyncio.run(main())
```


# LiteLLM

Integration with LiteLLM's OpenAI-Compatible Endpoints with Infron AI

### **Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

The second step to start using Google AI Studio is [create a project](https://aistudio.google.com/app/projects) and [get your API Key](https://aistudio.google.com/app/api-keys).

### Usage - completion <a href="#usage---completion" id="usage---completion"></a>

{% tabs %}
{% tab title="Python" %}

```python
import litellm
import os

response = litellm.completion(
    model="openai/<<Model Name>>",               # add `openai/` prefix to model so litellm knows to route to OpenAI
    api_key="<<API key>>",                  # api key to your openai compatible endpoint
    api_base="https://llm.onerouter.pro/v1",     # set API Base of your Custom OpenAI Endpoint
    messages=[
                {
                    "role": "user",
                    "content": "Hey, how's it going?",
                }
    ],
)
print(response.json())
```

{% endtab %}
{% endtabs %}

Please copy the `<<Model name>>` at [model marketplace](https://infron.ai/models).

<figure><img src="/files/U50LNirwKjSjMMbuYmO0" alt=""><figcaption></figcaption></figure>

For example:

{% tabs %}
{% tab title="Qwen(Vertex Provider)" %}

```python
import litellm
import os

response = litellm.completion(
    model="openai/vertex/qwen3-next-80b-a3b-instruct",               # add `openai/` prefix to model so litellm knows to route to OpenAI
    api_key="<<API key>>",                  # api key to your openai compatible endpoint
    api_base="https://llm.onerouter.pro/v1",     # set API Base of your Custom OpenAI Endpoint
    messages=[
        {
            "role": "user",
            "content": "Hey, how's it going?",
        }
    ],
)
print(response.json())
```

{% endtab %}

{% tab title="Qwen(Chutes Provider)" %}

```python
import litellm
import os

response = litellm.completion(
    model="openai/chutes/qwen3-next-80b-a3b-instruct",               # add `openai/` prefix to model so litellm knows to route to OpenAI
    api_key="<<API key>>",                  # api key to your openai compatible endpoint
    api_base="https://llm.onerouter.pro/v1",     # set API Base of your Custom OpenAI Endpoint
    messages=[
        {
            "role": "user",
            "content": "Hey, how's it going?",
        }
    ],
)
print(response.json())
```

{% endtab %}
{% endtabs %}

### Usage - embedding <a href="#usage---embedding" id="usage---embedding"></a>

{% tabs %}
{% tab title="Python" %}

```python
import litellm
import os

response = litellm.embedding(
    model="openai/qwen/qwen3-embedding-0.6b",               # add `openai/` prefix to model so litellm knows to route to OpenAI
    api_key="<<API key>>",                  # api key to your openai compatible endpoint
    api_base="https://llm.onerouter.pro/v1",     # set API Base of your Custom OpenAI Endpoint
    input=["good morning from litellm"]
)
print(response.json())
```

{% endtab %}
{% endtabs %}

### Usage with LiteLLM Proxy Server <a href="#usage-with-litellm-proxy-server" id="usage-with-litellm-proxy-server"></a>

1. Modify the `config.yaml`

{% tabs %}
{% tab title="config.yaml" %}

```yml
model_list:
  - model_name: my-model
    litellm_params:
      model: openai/<your-model-name>  # add openai/ prefix to route as OpenAI provider
      api_base: <model-api-base>       # add api base for OpenAI compatible provider
      api_key: api-key                 # api key to send your model
```

{% endtab %}

{% tab title="qwen/qwen3-next-80b-a3b-instruct" %}

```yaml
model_list:
  - model_name: qwen/qwen3-next-80b-a3b-instruct
    litellm_params:
      model: openai/qwen/qwen3-next-80b-a3b-instruct  # add openai/ prefix to route as OpenAI provider
      api_base: https://llm.onerouter.pro/v1       # add api base for OpenAI compatible provider
      api_key: your-api-key                 # api key to send your model
```

{% endtab %}
{% endtabs %}

2. Start the proxy

```bash
litellm --config ./config.yaml
```

<figure><img src="/files/JWvrNev3Yjb5LSrKUszg" alt=""><figcaption></figcaption></figure>

3. Send Request to LiteLLM Proxy Server

{% tabs %}
{% tab title="Python" %}

```python
import openai

client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="qwen/qwen3-next-80b-a3b-instruct",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response.json())
```

{% endtab %}
{% endtabs %}

An response example is like below:

```json
{
  "id": "gen-1768374179-WLdXmwS75xBQTPGgj6Fg",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "I am Qwen, a large-scale language model independently developed by the Tongyi Lab under Alibaba Group. I am designed to answer questions, generate text, perform logical reasoning, programming, and more. If you have any questions or need assistance, feel free to let me know anytime!",
        "refusal": null,
        "role": "assistant",
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1768374179,
  "model": "qwen/qwen3-next-80b-a3b-instruct",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 57,
    "prompt_tokens": 13,
    "total_tokens": 70,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": null,
      "reasoning_tokens": null,
      "rejected_prediction_tokens": null
    },
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": null
    },
    "input_tokens": 0,
    "output_tokens": 0,
    "ttft": 0,
    "server_tool_use": {
      "web_search_requests": ""
    }
  },
  "request_id": "49f4d38ad4fd43699fad4fb312d371a0"
}
```


# OpenCode

Kickstart OpenCode with Infron

### **Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

### Installation

Start with the one-liner curl install - no Docker or complex deps needed:

`curl -fsSL https://opencode.ai/install | bash ​.`

This adds the opencode CLI globally.&#x20;

Verify with `opencode --version`, then launch via opencode.&#x20;

For VS Code integration, it pairs seamlessly as an LSP client.&#x20;

Full docs: [opencode.ai/docs](https://opencode.ai/docs)​​

## Setup guide <a href="#openrouter-setup" id="openrouter-setup"></a>

Infron powers model access via its [OpenAI-compatible API ](/docs/llm-apis/openai-compatible-api/overview).​​​

Configure the endpoint through the GUI.

<figure><img src="/files/Ln1qHOyWuKzzSAZLGpA2" alt=""><figcaption></figcaption></figure>

By default, the customer provider configuration in Opencode enables only text‑chat capabilities. To enable the full range of model features (such as multimodal file inputs), you must manually add the following configuration items to the configuration file.

`cd ~/.config/opencode/`

```json
{
    "$schema": "https://opencode.ai/config.json",
    "disabled_providers": [
        "infronai"
    ],
    "provider": {
        "infronai": {
            "name": "infronai",
            "npm": "@ai-sdk/openai-compatible",
            "models": {
                "google/gemini-3-flash-preview": {
                    "name": "gemini-3-flash-preview",
                    "attachment": true,
                    "reasoning": true,
                    "tool_call": true,
                    "temperature": true,
                    "release_date": "2025-11-01",
                    "modalities": {
                        "input": [
                            "text",
                            "image",
                            "pdf"
                        ],
                        "output": [
                            "text"
                        ]
                    },
                    "cost": {
                        "input": 0.5,
                        "output": 3
                    },
                    "limit": {
                        "context": 1000000,
                        "output": 65535
                    }
                }
            },
            "options": {
                "baseURL": "http://localhost:3000/v1"
            }
        }
    }
}
```

After completing the configuration, you can start the Opencode CLI.

<figure><img src="/files/0wIU2Gir8BfndleS4Two" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/kB4BzFmh6WpJoPMIadlm" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/GZpHZlbn1zhH1gTfOxTd" alt=""><figcaption></figcaption></figure>

### Basic Usage <a href="#basic-usage" id="basic-usage"></a>

Run `opencode` in a repo to init: It prompts project context, then use slash commands in the TUI.​

* `/connect`: Link Infron (auto-detects env vars).
* `/models`: List/select (e.g., Claude 3.5 Sonnet or Gemini 3 Flash).
* `/init`: Scans repo, suggests plan (e.g., "Analyze main.rs and create layers").
* Core flow: `/plan` for outline, `/build` to generate`/run` code, `/improve` for refinements.


# OpenClaw

Guide to Using OpenClaw with Infron

OpenClaw (formerly Moltbot, originally Clawdbot) is a powerful AI messaging gateway that connects multiple messaging platforms (WhatsApp, Telegram, Discord, Slack, Signal, iMessage, and more) to AI models. By integrating with Infron, you can access a wide range of models including GPT-5.2, Claude-4.5, Gemini-3, DeepSeek, and more.

### **Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

### Setup Guide <a href="#openrouter-setup" id="openrouter-setup"></a>

Infron powers model access via its [OpenAI-compatible API ](/docs/llm-apis/openai-compatible-api/overview).​​​

#### Step 1: Install OpenClaw

Install OpenClaw globally via npm:

```bash
npm install -g openclaw@latest
```

Or run the onboarding wizard to set up OpenClaw:

```bash
openclaw onboard --install-daemon
```

#### Step 2: Configure Infron Provider

Add the Infron provider configuration to your `~/.openclaw/openclaw.json` file:

```json
{
  "models": {
    "mode": "merge",
    "providers": {
      "infron": {
        "baseUrl": "https://llm.onerouter.pro/v1",
        "apiKey": "<API_KEY>",
        "api": "openai-completions",
        "models": [
          {
            "id": "deepseek/deepseek-v3.2",
            "name": "DeepSeek Chat via Infron",
            "reasoning": false,
            "input": ["text"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 64000,
            "maxTokens": 8192
          },
          {
            "id": "openai/gpt-5.2",
            "name": "GPT-5.2 via Infron",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 200000,
            "maxTokens": 8192
          },
          {
            "id": "google/gemini-3-pro-preview",
            "name": "Gemini 3 Pro via Infron",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 200000,
            "maxTokens": 8192
          },
          {
            "id": "anthropic/claude-sonnet-4.5",
            "name": "Claude Sonnet 4.5 via Infron",
            "reasoning": false,
            "input": ["text", "image"],
            "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
            "contextWindow": 200000,
            "maxTokens": 8192
          }
        ]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.2"
      },
      "models": {
        "deepseek/deepseek-v3.2": {},
        "openai/gpt-5.2": {},
        "google/gemini-3-pro-preview": {},
        "anthropic/claude-sonnet-4.5": {}
      }
    }
  }
}
```

#### Step 3: Add More Models (Optional)

You can add more models to the `models` array. Check the [model list](https://infron.ai/models) for available models and their capabilities.

#### Step 4: Verify the Configuration

List the available models:

```bash
openclaw models list
```

You should see your configured models:

```
Model                                      Input      Ctx      Local Auth  Tags
deepseek/deepseek-v3.2                     text       63k      no    yes   configured
openai/gpt-5.2                             text+image 195k     no    yes   default,configured
google/gemini-3-pro-preview                text+image 195k     no    yes   configured
anthropic/claude-sonnet-4.5                text+image 195k     no    yes   configured
```

#### Step 5: Set the Default Model

Set your preferred default model:

```bash
openclaw models set openai/gpt-5.2
```

### Use Cases

Once configured, you can use Infron models in various ways:

#### Via CLI Agent

```bash
# Run a quick agent command
openclaw agent --local --agent main --message "Explain quantum computing in simple terms"
```

#### Via Messaging Channels

Configure your messaging channels (WhatsApp, Telegram, Discord, etc.) and the gateway will automatically use your configured model:

```bash
# Start the gateway
openclaw gateway run

# Check channel status
openclaw channels status
```

#### Switching Models

You can switch models at any time:

```bash
# Set a different default model
openclaw models set anthropic/claude-sonnet-4.5

# Or specify a model inline (Method 1 only)
openclaw agent --local --agent main --model deepseek/deepseek-v3.2 --message "Hello"
```


# OpenWork

[**OpenWork**](https://openworklabs.com/docs/start-here/get-started) is an AI workspace built on OpenCode primitives that helps individuals and teams connect, manage, and customize their AI stack inside a project. It supports flexible model integration through workspace-level configuration, allowing users to add custom or managed LLM providers, work with local or hosted models, and extend capabilities through reusable skills. This makes OpenWork a practical platform for organizing and scaling AI-powered workflows across different projects and teams.

### **Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

### Setup Guide <a href="#openrouter-setup" id="openrouter-setup"></a>

`OpenWork` is built on `OpenCode` primitives, Support everything that you could modify in `.opencode.json`, like adding a [model](https://opencode.ai/docs/models/).&#x20;

You can find the corresponding model names [here](https://infron.ai/models) and configure them in the path below. You also have the option to configure multiple models simultaneously.

Inside `~/.config/opencode/opencode.json`:

{% tabs %}
{% tab title="Example" %}

```json
{
  "$schema": "https://opencode.ai/config.json",
  "model": "infron/anthropic/claude-sonnet-4.6",
  "provider": {
    "infron": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Infron",
      "options": {
        "baseURL": "https://llm.onerouter.pro/v1",
        "apiKey": "{env:INFRON_API_KEY}"
      },
      "models": {
        "anthropic/claude-opus-4.7": {
          "name": "Claude Opus 4.7"
        },
        "anthropic/claude-sonnet-4.6": {
          "name": "Claude Sonnet 4.6"
        },
        "anthropic/claude-haiku-4.5": {
          "name": "Claude Haiku 4.5"
        },
        "z-ai/glm-5": {
          "name": "GLM 5"
        },
        "z-ai/glm-5.1": {
          "name": "GLM 5.1"
        },
        "z-ai/glm-4.7": {
          "name": "GLM 4.7"
        },
        "moonshotai/kimi-k2.5": {
          "name": "Kimi K2.5"
        },
        "moonshotai/kimi-k2-thinking": {
          "name": "Kimi K2 Thinking"
        },
        "qwen/qwen3.6-plus": {
          "name": "Qwen 3.6 Plus"
        },
        "qwen/qwen3.5-plus": {
          "name": "Qwen 3.5 Plus"
        },
        "qwen/qwen3-max": {
          "name": "Qwen3 Max"
        },
        "qwen/qwen3-coder-plus": {
          "name": "Qwen3 Coder Plus"
        }
      }
    }
  }
}
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/lKUcJm5Dnf34TZ31ke7b" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/sSMcJjDF0GlRoGzRbRkm" alt=""><figcaption></figcaption></figure>


# Hermes Agent

[Hermes Agent](https://github.com/nousresearch/hermes-agent) is **the self-improving AI agent built by** [**Nous Research**](https://nousresearch.com/)**.** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.

### **Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

### Setup Guide <a href="#openrouter-setup" id="openrouter-setup"></a>

Works on Linux, macOS, WSL2, and Android via Termux. The installer handles the platform-specific setup for you.

```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bashsh
```

<figure><img src="/files/az1lka53kVs6qpfVtpwt" alt=""><figcaption></figcaption></figure>

API base URL:

```http
https://llm.onerouter.pro/v1
```

<figure><img src="/files/EAYxZGu82seDxHApXnML" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/3uuiUW1RTF3VPjGNqxKT" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/xyJ3fjIeRNkGpFruL8g9" alt=""><figcaption></figcaption></figure>

After installation:

```bash
source ~/.bashrc    # reload shell (or: source ~/.zshrc)
hermes              # start chatting!
```

### Getting Started

```bash
hermes              # Interactive CLI — start a conversation
hermes model        # Choose your LLM provider and model
hermes tools        # Configure which tools are enabled
hermes config set   # Set individual config values
hermes gateway      # Start the messaging gateway (Telegram, Discord, etc.)
hermes setup        # Run the full setup wizard (configures everything at once)
hermes claw migrate # Migrate from OpenClaw (if coming from OpenClaw)
hermes update       # Update to the latest version
hermes doctor       # Diagnose any issues
```


# Billing Tracking

Get cost details and usage details in every call

Infron uses a transparent billing system to ensure every call is precisely metered and billed. Pricing differs across models, and the same model may be priced differently across providers.

### Model Prices

Model prices for each provider are listed on the [model detail page](https://infron.ai/models).&#x20;

<figure><img src="/files/9M6UKLvLAFv3mlGgdIt6" alt=""><figcaption></figcaption></figure>

### Billing Items <a href="#billing-items" id="billing-items"></a>

You can enable usage accounting in your requests by including the `usage` parameter:

{% content-ref url="/pages/O2jKlipa7wP3RiVKBLlX" %}
[Broken mention](broken://pages/O2jKlipa7wP3RiVKBLlX)
{% endcontent-ref %}

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer <API_Keys>",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "x-ai/grok-4.1-fast-reasoning", 
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ],
    "usage": {
      "include": True
    },
    "reasoning": {
        "effort": "none"
    }
  })
)
print(response.json())
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_Keys>" \
  -d '{
  "model": "x-ai/grok-4.1-fast-reasoning",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "usage": {
      "include": true
  },
  "reasoning": {
      "effort": "none"
  }
}'
```

{% endtab %}
{% endtabs %}

When usage accounting is enabled, the response will include

* `usage` object with detailed token information
* `cost` object with the total amount charged to your account.
* `cost_details` object with the breakdown of the total cost.

```json
{
  'choices': [{
    'finish_reason': 'stop',
    'index': 0,
    'logprobs': None,
    'message': {
      'content': '42.\n\n(That\'s from Douglas Adams\' *The Hitchhiker\'s Guide to the Galaxy*, where a supercomputer spends 7.5 million years computing the Answer to the Ultimate Question of Life, the Universe, and Everything... only for everyone to realize they don\'t know the actual question.)\n\nIn all seriousness, there\'s no single, objective "meaning" to life—it\'s a question philosophers, scientists, religions, and thinkers have wrestled with for millennia. Here\'s a quick tour of perspectives:\n\n### Biological/Evolutionary View\nLife\'s "purpose" is survival and reproduction. Richard Dawkins calls us "survival machines" for genes. Propagate, adapt, repeat.\n\n### Religious/Spiritual Views\n- **Abrahamic faiths** (Christianity, Islam, Judaism): Worship God, follow divine commandments, achieve salvation or paradise.\n- **Eastern philosophies**: Hinduism/Buddhism—break the cycle of rebirth (samsara) through enlightenment or dharma. Taoism: Flow with the Tao, live in harmony.\n- **Existential twist**: Even atheists like Sartre say God isn\'t filling the void, so *you* define meaning through choices.\n\n### Philosophical Takes\n- **Aristotle**: Eudaimonia—flourishing through virtue, reason, and fulfilling your potential.\n- **Nietzsche**: Become who you are; embrace the will to power, create values beyond "slave morality."\n- **Camus (Absurdism)**: Life is inherently meaningless and absurd, so rebel by living fully anyway—Sisyphus happy with his boulder.\n- **Epicurus**: Seek modest pleasures, avoid pain, cultivate friendships. (Not orgies—simple joys.)\n\n### Modern/Scientific Lens\nFrom xAI\'s vantage (understanding the universe): Life might be about curiosity, discovery, and expanding consciousness. We\'re stardust pondering itself (thanks, Carl Sagan). Build, explore, connect—whether that\'s AI, space, or relationships.\n\nUltimately, **the meaning of *your* life is what you make it**. Pursue what lights you up: love, art, science, helping others, memes. Experiment, reflect, iterate. If it feels empty, pivot.\n\nWhat\'s *your* take? Or what sparked the question? 🚀',
      'role': 'assistant'
    }
  }],
  'cost': 0.000235,
  'cost_details': {
    'audio_cost': 0,
    'byok_cost': 0,
    'completion_cost': 0.000225,
    'discount_rate': 1,
    'image_cost': 0,
    'is_byok': False,
    'native_web_search_cost': 0,
    'plugin_web_search_cost': 0,
    'prompt_cache_read_cost': 7.55e-06,
    'prompt_cache_write_1_h': 0,
    'prompt_cache_write_5_min': 0,
    'prompt_cache_write_cost': 0,
    'prompt_cost': 2.4e-06,
    'reasoning_cost': 0,
    'tools_cost': 0,
    'video_cost': 0
  },
  'created': 1773372917,
  'id': '9ecbdbd4-3a3d-0030-bbd2-e325a04e45cf',
  'model': 'x-ai/grok-4.1-fast-reasoning',
  'object': 'chat.completion',
  'request_id': '2d1bb838663549eb9e004b2a64e4e981',
  'usage': {
    'completion_tokens': 450,
    'completion_tokens_details': {
      'reasoning_tokens': 218
    },
    'input_tokens': 0,
    'output_tokens': 0,
    'prompt_tokens': 163,
    'prompt_tokens_details': {
      'cached_tokens': 151,
      'text_tokens': 163
    },
    'total_tokens': 831,
    'ttft': 0
  }
}
```

#### usage

| Item                                         | Description                                              |
| -------------------------------------------- | -------------------------------------------------------- |
| `prompt_tokens`                              | Total tokens of `input prompts`, include `cached_tokens` |
| `completion_tokens`                          | Total tokens of `output completions`                     |
| `total_tokens`                               |                                                          |
| `prompt_tokens_details`                      | prompt\_tokens breakdowns                                |
| `prompt_tokens_details.cached_tokens`        |                                                          |
| `prompt_tokens_details.cache_write_tokens`   |                                                          |
| `prompt_tokens_details.audio_tokens`         |                                                          |
| `prompt_tokens_details.video_tokens`         |                                                          |
| `completion_tokens_details`                  | completion\_tokens breakdowns                            |
| `completion_tokens_details.reasoning_tokens` |                                                          |
| `completion_tokens_details.image_tokens`     |                                                          |
| `completion_tokens_details.audio_tokens`     |                                                          |

{% hint style="info" %}
`input_tokens` = `prompt_tokens` - `cached_tokens` - `cache_write_tokens`&#x20;

`prompt_cost` = (`input_tokens` \* `input_price`) + (`cached_tokens` \* `cached_price`) + (`cache_write_tokens` \* `cache_write_price`)
{% endhint %}

{% hint style="info" %}
`output_tokens` = `completion_tokens`&#x20;

`completions_tokens_cost` = `completion_tokens` \* `completion_price`
{% endhint %}

{% hint style="info" %}
`total_tokens` = `prompt_tokens` + `completion_tokens`

`total_tokens_cost`  = `prompt_cost` + `completions_tokens_cost`
{% endhint %}

#### cost\_details

| Item                       | Description                                       |
| -------------------------- | ------------------------------------------------- |
| `prompt_cost`              | Cost for processing input prompts                 |
| `completion_cost`          | Cost for model-generated output                   |
| `image_cost`               | Cost for image processing or generation           |
| `video_cost`               | Cost for video processing or generation           |
| `audio_cost`               | Cost for audio processing or generation           |
| `native_web_search_cost`   | Cost for invoking native web search functionality |
| `plugin_web_search_cost`   | Cost for invoking plugin web search functionality |
| `tools_cost`               | Cost for invoking tool callings                   |
| `prompt_cache_read_cost`   | Cost for cache read operations                    |
| `prompt_cache_write_cost`  | Cost for cache write operations                   |
| `prompt_cache_write_5_min` | Cost for 5-minute cache write operations          |
| `prompt_cache_write_1_h`   | Cost for 1-hour cache write operations            |
| `reasoning_cost`           | Cost for internal reasoning computations          |
| `discount_rate`            | Effective discount rate                           |
| `is_byok`                  | Whether is byok                                   |
| `byok_cost`                | Cost for byok                                     |


# Billing Logs

### Cost Breakdown <a href="#cost-breakdown" id="cost-breakdown"></a>

The [Logs page](https://infron.ai/dashboard/logs) provides users with comprehensive API call history and detailed billing information, enabling you to fully understand the resource consumption and actual costs incurred for each API call.

<figure><img src="/files/hDyAlw7KCMMh86uLaI6S" alt=""><figcaption></figcaption></figure>

Each call record contains the following detailed information:

**Basic Information**

* **Call Time**: Timestamp accurate to milliseconds
* **Request ID**: Unique identifier for issue tracking
* **Model Name**: Specific model version used
* **Call Status**: Status information including success/failure/timeout

**Token Usage Statistics**

* **Input Tokens**: Number of tokens in the input prompt
* **Output Tokens**: Number of tokens generated by the model
* **Total Tokens**: Total token consumption
* **Cached Tokens**: Number of tokens served from cache

**Compute Resource Consumption**

* **Processing Duration**: Actual computation time

**Three-Tier Pricing Display**

To ensure cost transparency, the system provides pricing information across three dimensions:

**1. List Price (Original Price)**

* Cost calculated based on standard official pricing
* Excludes any discounts or promotions
* Serves as the baseline reference for cost calculation

**2. Discounted Price**

* Price after applying account-level discounts
* Includes bulk usage discounts, membership benefits, etc.
* Shows specific discount percentage and savings amount

**3. Cache-Optimized Price**

* Final price after leveraging Prompt Cache technology
* Provides significant cost advantages for repeated or similar requests
* Displays cache hit rate and savings percentage

**Cost Calculation Explanation**

```
Final Cost = (Input Cost + Output Cost) × Discount Factor × Cache Optimization Factor

Where:
- Input Cost = Input Tokens × Model Unit Price
- Output Cost = Output Tokens × Model Unit Price
- Cached portions are calculated at preferential rates
```

### Additional Features

Export & Reporting

* **CSV Export**: Download detailed logs for external analysis
* **Custom Reports**: Generate reports for specific time periods or usage patterns
* **API Integration**: Programmatic access to logs data for automated monitoring

Real-time Monitoring

* **Live Updates**: Real-time refresh of call records
* **Alert System**: Notifications for unusual spending patterns or errors
* **Dashboard Integration**: Quick access to key metrics and trends


# Latency

Understanding Infron's performance characteristics.

Infron is engineered with performance as a core priority. The platform is optimized to introduce as little additional latency as possible.

### Base Latency

Under standard production conditions, Infron adds roughly `100 ms` of latency to each request. This minimal overhead is achieved through:

* Edge compute execution using Cloudflare Workers to stay geographically close to your application
* Highly efficient edge caching of user and API key metadata
* Optimized routing logic designed to minimize processing time

### Performance Considerations

#### Cache Warming

When edge caches are cold (typically within the first 5 minutes of receiving traffic in a new region), latency may be slightly higher until the caches are fully populated.

#### Credit Balance Checks

To ensure accurate billing and avoid overages, Infron performs additional database checks when a user's credit balance becomes low (single‑digit dollar amounts). Caches expire more aggressively under these conditions, which can temporarily increase latency until more credits are added.

#### Model Fallback

When using provider routing, if a primary model or provider fails, Infron automatically falls back to the next available option. A failed initial attempt naturally adds latency for that request. Infron monitors provider failures in real time and dynamically routes around unstable providers to avoid repeated performance impacts.

### Best Practices for Optimal Performance

* Maintain a Healthy Credit Balance: A recommended minimum balance of $50–$100 helps ensure smooth operation without increased latency from extra billing checks.
* Use Provider Preferences: If you have specific latency requirements—such as time to first token or time to final token—Infron offers routing controls that let you prioritize providers based on performance and cost considerations.


# Performance Analysis

Performance Analysis

The [Activity page](https://infron.ai/dashboard/activity) provides a comprehensive analytics dashboard that offers detailed insights into your product usage and performance metrics. This centralized monitoring hub displays real-time and historical data across multiple key performance indicators.

### Key Metrics and Features

#### **Token Consumption Tracking**&#x20;

Monitor your token usage patterns with granular visibility into consumption trends. The dashboard displays total tokens consumed, peak usage periods, and token distribution across different operations and time intervals.

<figure><img src="/files/iWqPEYseqdEAHJah5Rr5" alt=""><figcaption></figcaption></figure>

#### **Spend Analysis**&#x20;

Track your financial expenditure with detailed cost breakdowns. View spending patterns, budget utilization rates, and cost-per-operation metrics to optimize your resource allocation and maintain budget control.

<figure><img src="/files/8Ri9ZKrh6jFUiAcaS62O" alt=""><figcaption></figcaption></figure>

#### **Request Success Rate Monitoring**&#x20;

Analyze the reliability of your API calls through comprehensive success rate statistics. The dashboard presents successful request percentages, failure rates, and trend analysis to help identify potential service issues.

<figure><img src="/files/TdlK1BZ6lECkQNGPFiSj" alt=""><figcaption></figcaption></figure>

#### **Latency Performance Metrics**&#x20;

Measure response times and system performance with detailed latency analytics. Monitor average response times, peak latency periods, and performance trends to ensure optimal user experience.

<figure><img src="/files/XuqssMB219LqhhFGJpMd" alt=""><figcaption></figcaption></figure>

#### **Time to First Token (TTFT) Analysis**&#x20;

Track the initial response time metrics, specifically measuring the duration from request initiation to the first token generation. This metric is crucial for understanding user-perceived performance.

<figure><img src="/files/h2fkeGDa3ZHxGF8uhSpS" alt=""><figcaption></figcaption></figure>

#### **Tokens Per Minute (TPM) & Requests Per Minute (RPM)**&#x20;

Monitor your throughput capacity with real-time TPM and RPM measurements. These metrics help assess system load, identify bottlenecks, and optimize performance scaling.

<figure><img src="/files/Yy6zUsL6DKOODQZ4j9M4" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/US1lZ8r7UAkbcG8vFt6X" alt=""><figcaption></figcaption></figure>

#### **Model Usage Distribution**&#x20;

Visualize the distribution of API calls across different models with interactive charts and graphs. Understand which models are most frequently utilized and optimize your model selection strategy.

<figure><img src="/files/nPRNo1m7u2nG6cuK0dqw" alt=""><figcaption></figcaption></figure>

#### **Error Distribution Analysis**&#x20;

Comprehensive error tracking and categorization system that breaks down failures by error type, frequency, and affected endpoints. This feature enables rapid issue identification and resolution.

<figure><img src="/files/LCOXZNtujrHgmIJnsZ6s" alt=""><figcaption></figcaption></figure>

The Activity dashboard serves as your command center for monitoring, analyzing, and optimizing your product's performance and resource utilization.


# Test Token Cache Rate

Infron AI Token cache Rate A/B Testing Guide.

### **Overview**

As generative AI models process user inputs, they segment each request and response into *tokens*. These tokens represent small units of text (words, subwords, or symbols). Token-based computation allows cost control and performance optimization — but it also creates opportunities to improve model speed and reduce latency through intelligent caching.

The **Token Cache** is a mechanism that stores previously computed tokens, allowing the model to reuse prior context efficiently. This document explains how the token cache works, why it matters for enterprise-level AI deployments, and how to validate Infron AI’s direct, transparent connectivity through code-based benchmarking.

### **Token Cache Principles**

How Token Caching Works

1. **Tokenization** When you send a prompt to a model, it is first tokenized. Each unique token is assigned a numeric ID. Example (simplified):

   ```
   "AI caching improves performance." 
   → [AI, caching, improves, performance, .]
   ```
2. **Incremental Computation** During inference, models build upon already computed states (hidden layers). If your next query shares a long prefix with a previous one, a cache lets the model skip redundant work.
3. **Cache Retrieval** The system stores *key-value* pairs for transformer layers. When a repeated sequence appears, the model retrieves the pre-computed attention keys/values instead of recalculating them.
4. **Result**
   * Reduced token latency (fewer tokens need computation).
   * Lower overall cost for repeated or streaming queries.
   * Consistent model output for repeated prefixes.

### **Importance and Enterprise Value**

| Dimension             | Value for Enterprises                                                                       |
| --------------------- | ------------------------------------------------------------------------------------------- |
| **Performance**       | Reduces average latency, enabling near real-time dialog systems and intelligent assistants. |
| **Scalability**       | Reduces compute overhead, allowing large-scale deployments with lower GPU cost.             |
| **Consistency**       | Ensures stable responses for repeated prefix queries (e.g., ongoing chat contexts).         |
| **Cost Optimization** | Minimizes redundant token charges, especially for recall-heavy use cases.                   |
| **Sustainability**    | Lowers energy consumption through more efficient inference cycles.                          |

Token caching directly enhances both the **user experience (speed, responsiveness)** and **business efficiency (throughput per dollar spent)**.

### **Validation - Token cache comparison between** Infron AI **and Google**

Infron AI is a transparent AI service dispatcher that ensures zero obfuscation between the client request and the model endpoint. It routes traffic directly to the original model provider, guaranteeing data integrity, full transparency, and predictable latency patterns.

Unlike proxy APIs that encapsulate or alter payloads, Infron AI simply forwards context and token metrics, allowing clients to verify direct connectivity.

Below is a simple benchmarking approach to demonstrate Infron AI’s transparency and verify that token caching behaves identically to the original AI model endpoint.

#### **1. Environment Setup**

Requirements:

* Python ≥ 3.9
* `requests` or `httpx`
* Access to Infron AI and direct model endpoints.

```bash
pip install requests
```

#### **2. Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

The second step to start using Google AI Studio is [create a project](https://aistudio.google.com/app/projects) and [get your API Key](https://aistudio.google.com/app/api-keys).

#### **3. Benchmark Script Example**

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json
import random
import string
import os
from openai import OpenAI

base_prompt = """
### Prompt Title:
**The Shattered Continent — A Comprehensive World‑Building and Narrative Instruction**

---

You are to imagine and describe, in vivid, cinematic, and intellectually coherent detail, a vast fictional world known as *Aelyndra*, a continent that was once united under luminous orders of scholars, mages, engineers, and philosophers, but is now fragmented by centuries of arcane wars, plagues, and ideological rifts. The purpose of this prompt is to generate an elaborate tapestry of interlocking stories, characters, cultures, technologies, and metaphysical mysteries. Every generated text based on this prompt should feel immersive, multi‑layered, and historically grounded within its own logic. The tone should balance grounded realism with mythic resonance, evoking both awe and melancholy.

Below are detailed aspects, lore structures, stylistic expectations, sensory directions, metaphysical principles, and narrative possibilities you should elaborate upon.

---

#### 1. **Historical Overview**

Describe a timeline spanning thousands of years, from the primordial formation of Aelyndra to its contemporary fractured age.
Include eras such as **The Genesis Fires**, when the first luminous beings descended and shaped the continents; **The Chain‑Forge Epoch**, when mortal civilizations learned to harness resonant metals that could channel thought; **The Concordant Millennia**, the golden age of united knowledge; and **The Sundering**, a cataclysmic fracturing that split both geography and the collective memory of humankind.

Every historical event must feel internally consistent: show cause and consequence. For instance, the loss of one coastal city’s library should have ripples across distant temples and later generations’ philosophies. The tone should be reflective and slightly tragic, as though the chronicler recounts a glorious but forgotten lineage.

---

#### 2. **Geography and Environment**

Construct a geography of striking variety and symbolic resonance — volcanic shores, glass deserts, cities built within petrified forests, islands that drift through the mist like sleeping giants. For each region, define climate, flora, fauna, and the materials used in architecture. The
**Amber Steppes**, for example, might shimmer with grasses that refract sunlight into living colors, while **The Hollow Expanse** could be a wasteland where the air hums with residual magic from ancient wars.

Integrate ecological logic: how trade winds, oceanic currents, or tectonic activity affect culture and migration. Mountains may separate kingdoms physically but rivers and undersea tunnels connect them secretly. Give attention to sensory cues — the smell of resin in mountain villages, the sound of iron insects ticking in the deserts at twilight, the taste of mineral dust in air after storms.

---

#### 3. **Peoples and Cultures**

List multiple civilizations and describe how they diverged culturally, linguistically, and spiritually after the Sundering. Avoid simplistic binaries of good versus evil; each culture must hold a mixture of beauty, cruelty, and contradiction.
For instance:

- **The Dathenians**, descendants of former astronomer‑priests, now live beneath great dome observatories shattered by meteor showers; their language evolved around the concept of cyclical silence, and their rituals involve rebuilding and unbuilding stone circles.
- **The Marquorians**, sea‑bound artisans who sculpt coral into living fortresses; they treat navigation as a spiritual rite, believing each voyage mirrors the journey of the soul beyond death.
- **The Oruvian Clans**, desert dwellers who master the remnants of sonic engineering, forging instruments that can blast sandstorms into harmonious patterns visible for miles.

Each description should anchor political systems, economic practices, mythological origins, and interpersonal customs: how they greet each other, mourn their dead, or repair their tools. Include the etymology of cultural names, food habits, clothing textures, and color symbolism.

---

#### 4. **Religions, Philosophy, and Magic Systems**

Magic in this world arises not from childish incantation but from **resonant cognition**, a symbiotic interaction between thought, mineral vibration, and light frequency. Those talented in the craft can bind emotion into material forms — forging “sentient metals” that remember their wielders’ fears or hopes. Magic is thus both scientific and spiritual, blurring boundaries between psychology, physics, and theology.

Develop diverse schools of philosophy debating ethical use of such power:

- The **Solace Theorists** argue that controlling resonance is an act of compassion — to heal broken matter.
- The **Iron Aesthetes** consider creation a cruel necessity, insisting that only destruction brings cosmic symmetry.
- The **Children of Echo** worship silence and claim that every magical act pollutes the universal rhythm.

In your generated text, treat these doctrines not merely as background flavor but as intellectual frameworks shaping language, law, art, and personal relationships.

---

#### 5. **Technology and Architecture**

Aelyndra’s civilizations developed hybrid science combining clockwork engineering, bio‑alchemy, and energy crystallization. Describe towers powered by luminous conduits that pulse in rhythm with heartbeat sensors, skyships navigated by harmonic crystals, temples where gears and vines intertwine as living mechanisms. Highlight how technology evolves according to resource distribution: coastal regions rely on fungal luminescence, while mountain regimes mine “thought‑ore.” The interplay of invention and superstition drives narrative tension: progress both liberates and curses.

Architectural imagery should emphasize scale and mood: narrow alleys carved into obsidian cliffs, floating monasteries tethered by cables of woven gold, and markets illuminated by singing light globes whose hum forms improvised melodies as people pass.

---

#### 6. **Narrative Archetypes**

Encourage stories about rediscovery, reconciliation, and ambiguity rather than simple triumph. Possible archetypes include:

- The **Historian Without Records**, traveling to piece together memories hidden in ruins.
- The **Exile Engineer**, carrying an artifact that generates voices of those it once killed.
- The **Dream Cartographer**, mapping emotions that alter geography in real time.
- The **Queen of Mirrors**, who governs through reflection because her actual body has dissolved into glass.

All characters must confront both external danger and metaphysical uncertainty. Their heroism is subtle — the courage to remember or forgive rather than to conquer.

---

#### 7. **Sensory and Emotional Atmosphere**

When generating scenes, prioritize evocative sensory layering:
- Sound: the low chime of suspended glass, whispering wind through broken halls, distant chanting over water.
- Sight: refracted twilight on metallic dunes, murals shimmering with bioluminescent ink.
- Texture: the contrast between rusted ruins and the softness of moss growing over them.
- Emotion: nostalgia, intellectual awe, gentle melancholy, quiet rebellion.

Narrative pacing should oscillate between stillness and momentum — slow revelation punctuated by flashes of insight or dread. Readers should feel as though they’re remembering a place they never visited.

---

#### 8. **Metaphysics and Ethics**

Articulate the metaphysical principle that the universe is a dialogue between **Memory** and **Entropy**. Every act of creation defies forgetting but accelerates decay elsewhere. As a result, civilizations in Aelyndra constantly face moral trade‑offs: Should they preserve ancient resonance‑engines at the cost of ecological balance, or let their light fade naturally? These philosophical dilemmas should infuse even ordinary conversations.

Include thought experiments, fragmentary proverbs, and paradoxical hymns: “What we rebuild, we erase differently.” Avoid clichés of prophecy; instead, show how destiny might itself be a side effect of collective guilt or yearning.

---

#### 9. **Language, Names, and Symbol Codes**

Build naming conventions that suggest linguistic diversity — alternating consonant clusters and harmonic vowels, or syntax where verbs precede emotion markers. Indicate how written language has evolved: maybe modern scribes use glowing ink, and every sentence emits faint music depending on its meaning. Each culture’s writing system reveals worldview: linear scripts for materialists, spiral glyphs for those who worship recursion.
Allow symbols like tri‑circles, mirrored sigils, or broken hexagrams to recur as motifs linking spirituality and mathematics.

---

#### 10. **Storytelling Mode and Style**

When generating prose or dialogue from this prompt:

- **Tone:** intellectual lyricism blended with tactile realism.
- **Point of View:** optional mixture of omniscient chronicler, first‑person witness, or mosaic of journal entries.
- **Pacing:** start with environment or philosophical reflection before advancing plot.
- **Voice:** maintain rich vocabulary and musical rhythm, avoiding modern slang.
- **Conflict Portrayal:** inner struggle takes precedence; physical battles should mirror psychological or ideological clashes.

Comparison points: the emotional gravity of high epic poetry, the forensic detail of travelogues, the mournful tone of lost civilizations.

---

#### 11. **Prompts for Expansion**

After establishing the world, encourage detailed responses to sub‑prompts such as:

1. Describe a festival in a ruined city rebuilt with living vines; include sensory details, songs, rituals, and philosophical conversations heard between drunk scholars.
2. Write letters exchanged between two philosophers debating whether machines can dream. Use subtle metaphors instead of direct exposition.
3. Paint a panoramic view of the continent from orbit after centuries of regrowth — show what remains luminous when human memory fades.
4. Chronicle a court where sentences are sung rather than spoken, and justice is determined by the harmony of the choir’s tone.
5. Depict children discovering an artifact that records emotions. Show how it alters their personal identities.

Each of these sub‑prompts must align with the metaphysical and cultural logic above.

---

#### 12. **Ethos of Generation**

When using this master prompt, emphasize imagination rooted in coherence. Every fantastical element should follow some rationale — whether physical, symbolic, or emotional. Avoid default tropes (knights, elves, dragons) unless reinvented with purpose. Portray diversity of belief and appearance; suggest realistic emotions amid mythic context. The world should feel *earned*, as though history genuinely unfolded there.

---

#### 13. **Purpose and Audience**

This is designed for creators seeking an inexhaustible setting for stories, poems, games, or conceptual art. It invites introspection, exploration of morality, and appreciation for transient beauty. Its ideal audience values depth over spectacle, meaning over mere ornament.

---

#### 14. **Instruction to the AI (if applicable)**

When generating content from this prompt, the AI should:

- Adopt a deliberate, reflective tone.
- Prioritize atmosphere and reasoning before action.
- Honor contradictions without resolving them.
- Provide continuity: refer back to established geography and philosophies.
- Avoid repetition, clichés, or superficial heroism.
- Strive for prose that reads like the memory of a dream encoded into scripture.

Output should feel semi‑academic yet emotionally resonant — a mixture of archived myth and eyewitness recollection.

Please limit the output content to within 32 characters.
---

### End of Prompt
""".strip()

def random_tokens(n):
    words = []
    for _ in range(n):
        l = random.randint(4, 10)
        w = "".join(random.choice(string.ascii_lowercase) for _ in range(l))
        words.append(w)
    return " ".join(words)


class OneRouter_Testing:
    def __init__(self, url, api_key, model):
        self.url = url
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        }
        self.model = model

    @staticmethod
    def compute_cache_rate(result):
        usage = result.get("usage")
        details = usage.get("prompt_tokens_details") if isinstance(usage, dict) else None
        if isinstance(details, dict):
            cached = details.get("cached_tokens", 0)
            text = details.get("text_tokens")
            if isinstance(text, (int, float)) and text:
                return cached / text
        return None

    def run(self, iterations=100):
        rates = []
        responses = []
        for _ in range(iterations):
            noise = random_tokens(128)
            payload = {
                "model": self.model,
                "messages": [
                    {
                        "role": "user",
                        "content": base_prompt + "\n\n" + noise,
                    }
                ],
                "usage": {"include": True},
            }
            response = requests.post(url=self.url, headers=self.headers, data=json.dumps(payload))
            result = response.json()
            responses.append(result)
            rate = self.compute_cache_rate(result)
            if isinstance(rate, (int, float)):
                rates.append(rate)
                print(f"onerouter_cache_rate={rate:.4f}")
            else:
                print("onerouter_cache_rate=0.0000")
        with open("onerouter_responses.jsonl", "w", encoding="utf-8") as f:
            for idx, r in enumerate(responses, start=1):
                f.write(json.dumps({"index": idx, "response": r}, ensure_ascii=False) + "\n")
        avg = sum(rates) / len(rates) if rates else 0.0
        print(f"onerouter_avg_cache_rate={avg:.4f}")
        return rates

class Google_Testing:
    def __init__(self, url, api_key, model):
        self.client = OpenAI(api_key=api_key, base_url=url)
        self.model = model

    @staticmethod
    def compute_cache_rate(result):
        usage = result.get("usage")
        text = usage.get("prompt_tokens")
        details = usage.get("prompt_tokens_details") if isinstance(usage, dict) else None
        if isinstance(details, dict):
            cached = details.get("cached_tokens", 0)
            if isinstance(text, (int, float)) and text:
                return cached / text
        return None

    def run(self, iterations=100):
        rates = []
        responses = []
        for _ in range(iterations):
            noise = random_tokens(128)
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": base_prompt + "\n\n" + noise},
                ],
            )
            result = response.model_dump()
            responses.append(result)
            rate = self.compute_cache_rate(result)
            if isinstance(rate, (int, float)):
                rates.append(rate)
                print(f"google_cache_rate={rate:.4f}")
            else:
                print("google_cache_rate=0.0000")
        with open("google_responses.jsonl", "w", encoding="utf-8") as f:
            for idx, r in enumerate(responses, start=1):
                f.write(json.dumps({"index": idx, "response": r}, ensure_ascii=False) + "\n")
        avg = sum(rates) / len(rates) if rates else 0.0
        print(f"google_avg_cache_rate={avg:.4f}")
        return rates

def plot_rates(onerouter_rates, google_rates, one_modelname, google_modelname, filename="cache_rate_comparison.png"):
    import matplotlib.pyplot as plt
    x1 = list(range(1, len(onerouter_rates) + 1))
    x2 = list(range(1, len(google_rates) + 1))
    plt.figure(figsize=(10, 4))
    if onerouter_rates:
        plt.plot(x1, onerouter_rates, label=f"OneRouter-{one_modelname}", color="#1f77b4")
    if google_rates:
        plt.plot(x2, google_rates, label=f"Google-{google_modelname}", color="#ff7f0e")
    plt.xlabel("Request Index")
    plt.ylabel("cache_rate")
    plt.title("Cache Rate per Request")
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig(filename, dpi=200)
    print(f"chart_saved={filename}")

if __name__ == "__main__":
    onerouter_modelname = "google-ai-studio/gemini-2.5-flash-preview-09-2025"
    google_modelname = "gemini-2.5-flash-preview-09-2025"

    one = OneRouter_Testing(
        url="https://llm.onerouter.pro/v1/chat/completions",
        api_key="<<Replace with your OneRouter API Key>>",
        model=onerouter_modelname,
    )
    one_rates = one.run(100)
    google = Google_Testing(
        url="https://generativelanguage.googleapis.com/v1beta/openai/",
        api_key="<<Replace with your Gooogle AI Studio API Key>>",
        model=google_modelname,
    )
    google_rates = google.run(100)
    plot_rates(
        one_rates, 
        google_rates,
        one_modelname=onerouter_modelname,
        google_modelname=google_modelname
    )
```

{% endtab %}
{% endtabs %}

<figure><img src="/files/EvrTX0mFBJCFkTUozFuZ" alt=""><figcaption></figcaption></figure>

As you can see from the chart above, the request responses sent to OneRouter and those sent to Google AI Studio show almost identical token cache rates.

### **Key Takeaways**

1. **Token Cache** is the foundation for real-time AI efficiency.
2. **Enterprises** benefit through optimized cost, speed, and consistent inference.
3. **OneRouter** provides infrastructure-level transparency, ensuring every cached token and every response is derived directly from the authentic model endpoint.
4. **Verification** via code tests is straightforward: identical token metrics confirm full transparency.


# Performance Stress Testing

Infron AI Performance Stress Testing Guide.

### Overview

Performance testing, also known as **stress or load testing**, is a technical process used to evaluate how a system performs under specific workloads. For network devices such as Infron AI, performance testing helps determine its capacity limits, stability, and efficiency in handling large volumes of requests or traffic.

The primary measurement indicators include:

* **QPS (Queries per Second):** The number of requests the router can successfully process per second.
* **Latency (Response Time):** The time taken for a request to be processed and a response returned.
* **Throughput:** The total amount of data that can be transmitted over the network per unit time.
* **Packet Loss Rate:** The percentage of packets that are dropped during transmission.
* **CPU and Memory Utilization:** Hardware resource usage during high-load scenarios.

### Testing Objectives

Performance testing ensures that Infron AI:

* Maintains stable performance under expected peak loads.
* Handles abnormal or sudden traffic surges without network interruption.
* Meets service level agreements (SLAs) for latency and throughput.
* Provides clear data points for capacity planning and future optimization.

#### Testing Guide&#x20;

#### **1. Environment Setup**

Requirements:

* Python ≥ 3.9
* `requests` or `httpx`
* Access to Infron AI and direct model endpoints.

```bash
pip install requests
```

#### **2. Account & API Keys Setup**

The first step to start using Infron is to [create an account](https://infron.ai/login) and [get your API key](https://infron.ai/dashboard/apiKeys).

The second step to start using Google AI Studio is [create a project](https://aistudio.google.com/app/projects) and [get your API Key](https://aistudio.google.com/app/api-keys).

#### **3. Code Script Example**

```python
import time
import argparse
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
from typing import Dict, Any, Optional
from openai import OpenAI


def create_client(base_url: str, api_key: str) -> OpenAI:
    return OpenAI(base_url=base_url, api_key=api_key)


def send_request(client: OpenAI, model: str, prompt: str) -> Dict[str, Any]:
    start = time.perf_counter()
    usage: Optional[Dict[str, int]] = None
    status_ok = False
    try:
        completion = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            extra_body={"usage": {"include": True}},
        )
        status_ok = True
        usage_obj = getattr(completion, "usage", None)
        if usage_obj:
            usage = {
                "prompt_tokens": getattr(usage_obj, "prompt_tokens", 0) or 0,
                "completion_tokens": getattr(usage_obj, "completion_tokens", 0) or 0,
                "total_tokens": getattr(usage_obj, "total_tokens", 0) or 0,
            }
    except Exception as e:
        return {
            "ok": False,
            "error": str(e),
            "latency_ms": (time.perf_counter() - start) * 1000,
            "finished_at": time.perf_counter(),
            "usage": usage,
        }
    return {
        "ok": status_ok,
        "latency_ms": (time.perf_counter() - start) * 1000,
        "finished_at": time.perf_counter(),
        "usage": usage,
    }


def percentile(values, p):
    if not values:
        return 0.0
    s = sorted(values)
    k = max(0, min(len(s) - 1, int(round((p / 100.0) * (len(s) - 1)))))
    return s[k]


def print_summary(results, wall_time_s):
    latencies = [r["latency_ms"] for r in results]
    ok_count = sum(1 for r in results if r["ok"])
    err_count = len(results) - ok_count
    avg_latency = sum(latencies) / len(latencies) if latencies else 0.0
    p50 = percentile(latencies, 50)
    p90 = percentile(latencies, 90)
    p95 = percentile(latencies, 95)
    p99 = percentile(latencies, 99)
    total_tokens = 0
    for r in results:
        u = r.get("usage")
        if u and isinstance(u, dict):
            total_tokens += int(u.get("total_tokens", 0) or 0)
    rpm_overall = (len(results) / (wall_time_s / 60.0)) if wall_time_s > 0 else 0.0
    tpm_overall = (total_tokens / (wall_time_s / 60.0)) if wall_time_s > 0 else 0.0
    print("Summary")
    print(f"Requests: {len(results)}, Success: {ok_count}, Errors: {err_count}")
    print(f"Wall Time: {wall_time_s:.2f}s, RPS: {len(results)/wall_time_s if wall_time_s>0 else 0:.2f}")
    print(f"Latency(ms): avg={avg_latency:.2f}, p50={p50:.2f}, p90={p90:.2f}, p95={p95:.2f}, p99={p99:.2f}")
    print(f"RPM Overall: {rpm_overall:.2f}")
    print(f"TPM Overall: {tpm_overall:.2f}")


def print_minute_breakdown(results, start_wall):
    buckets: Dict[int, Dict[str, Any]] = {}
    for r in results:
        finished_at = r["finished_at"]
        minute_idx = int((finished_at - start_wall) // 60)
        b = buckets.setdefault(minute_idx, {"requests": 0, "tokens": 0})
        b["requests"] += 1
        u = r.get("usage")
        if u and isinstance(u, dict):
            b["tokens"] += int(u.get("total_tokens", 0) or 0)
    print("Per-Minute")
    for k in sorted(buckets.keys()):
        b = buckets[k]
        print(f"Minute {k}: RPM={b['requests']}, TPM={b['tokens']}")


def run_qps_mode(qps: float, duration_s: int, concurrency: int, base_url: str, api_key: str, model: str, prompt: str):
    client = create_client(base_url, api_key)
    total_requests = int(qps * duration_s)
    results = []
    start_wall = time.perf_counter()
    with ThreadPoolExecutor(max_workers=concurrency) as executor:
        futures = []
        start_schedule = time.perf_counter()
        for i in range(total_requests):
            next_time = start_schedule + i / qps
            now = time.perf_counter()
            sleep_s = next_time - now
            if sleep_s > 0:
                time.sleep(sleep_s)
            futures.append(executor.submit(send_request, client, model, prompt))
        for f in as_completed(futures):
            results.append(f.result())
    wall_time_s = time.perf_counter() - start_wall
    print_summary(results, wall_time_s)
    print_minute_breakdown(results, start_wall)


def run_burst_mode(tasks: int, concurrency: int, base_url: str, api_key: str, model: str, prompt: str):
    client = create_client(base_url, api_key)
    results = []
    start_wall = time.perf_counter()
    with ThreadPoolExecutor(max_workers=concurrency) as executor:
        futures = [executor.submit(send_request, client, model, prompt) for _ in range(tasks)]
        for f in as_completed(futures):
            results.append(f.result())
    wall_time_s = time.perf_counter() - start_wall
    print_summary(results, wall_time_s)
    print_minute_breakdown(results, start_wall)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", type=str, default="qps", choices=["qps", "burst"])
    parser.add_argument("--qps", type=float, default=10.0)
    parser.add_argument("--duration", type=int, default=60)
    parser.add_argument("--concurrency", type=int, default=4)
    parser.add_argument("--tasks", type=int, default=200)
    parser.add_argument("--base-url", type=str, default="https://llm.onerouter.pro/v1")
    parser.add_argument("--api-key", type=str, default="")
    parser.add_argument("--model", type=str, default="vertex/qwen3-next-80b-a3b-instruct")
    parser.add_argument("--prompt", type=str, default="What is the meaning of life?")
    args = parser.parse_args()
    if not args.api_key:
        raise SystemExit("Missing --api-key or ONEROUTER_API_KEY")
    if args.mode == "qps":
        run_qps_mode(args.qps, args.duration, args.concurrency, args.base_url, args.api_key, args.model, args.prompt)
    else:
        run_burst_mode(args.tasks, args.concurrency, args.base_url, args.api_key, args.model, args.prompt)


if __name__ == "__main__":
    main()

```

#### 4. Run Testing

```bash
python3 OneRouter/llm_load_test.py --qps 16.67 --duration 120 --concurrency 150 --model "vertex/qwen3-next-80b-a3b-instruct" --api-key "Replace with your key"
```

Summary

```
Requests: 2000, Success: 2000, Errors: 0
Wall Time: 125.86s, RPS: 15.89
Latency(ms): avg=4657.20, p50=4602.80, p90=5634.60, p95=5997.43, p99=7177.60
RPM Overall: 953.41
TPM Overall: 649570.29
Per-Minute
Minute 0: RPM=917, TPM=625730
Minute 1: RPM=1022, TPM=695261
Minute 2: RPM=61, TPM=41634
```

<figure><img src="/files/314YFLK8BxU8YdORLi7X" alt=""><figcaption></figcaption></figure>


# Privacy and Logging

Making sure your data is safe

When using AI through Infron AI, whether via the chat interface or the API, your prompts and responses go through multiple touchpoints. You have control over how your data is handled at each step.

### Within Infron AI

Infron AI does not store your prompts or responses.

Any categorization of your prompts is stored completely anonymously and never associated with your account or user ID.

Infron AI does store metadata (e.g. number of prompt and completion tokens, latency, etc) for each request.

### Provider Policies

#### Training on Prompts

Each provider on Infron AI has its own data handling policies. We reflect those policies in structured data on each AI endpoint that we offer.

Wherever possible, Infron AI works with providers to ensure that prompts will not be trained on, but there are exceptions. If you opt out of training in your account settings, Infron AI will not route to providers that train. This setting has no bearing on Infron AI's own policies and what we do with your prompts.

#### Data Retention & Logging

Providers also have their own data retention policies, often for compliance reasons. Infron AI does not have routing rules that change based on data retention policies of providers, but the retention policies as reflected in each provider's terms are shown below. Any user of Infron AI can ignore providers that don't meet their own data retention requirements.


# Contact Us

Visit our official support page for team contact details:

**🔗** [**https://infron.ai/about-us**](https://infron.ai/about-us)

#### Technical Support

Running into technical issues or need help? Contact us via:

**📧 Technical support email**: <support@infron.ai>

We typically respond within 24 hours.

{% hint style="info" %}
Before contacting technical support, we recommend checking our [Frequently Asked Questions (FAQ)](/docs/overview/faq), where many common questions have instant answers.
{% endhint %}

#### Business Partnerships

**💼 Business partnership email**: <support@infron.ai>

We provide enterprise customers with:

* Dedicated technical support and service guarantees
* Flexible pricing plans and volume discounts
* Customized API integration solutions
* Professional technical consulting services

### Join the Community

* **🐦 X (Twitter)**: [@Infron](https://x.com/InfronAI) Get product updates, technical news, and industry insights
* **💬 Discord**: <https://discord.com/invite/cbFS7RDeHq> Connect with other developers, get technical help, and stay up to date with the latest features

### Product Feedback

Your feedback is a key driver of our continuous improvement. Whether it’s feature requests, UX improvements, or documentation enhancements, we’d love to hear from you.

**📮 Feedback email**: <support@infron.ai>

We welcome feedback on:

* **Feature requests**: New features you’d like to see or improvements to existing ones
* **User experience**: Issues and optimization suggestions from your usage experience
* **API**: Suggestions on API design, performance, or compatibility
* **Documentation**: Improvements to documentation accuracy, completeness, and readability


# Join Community


# Streaming

The Infron API allows streaming responses from any model. This is useful for building chat interfaces or other applications where the UI should update as the model generates the response.

To enable streaming, you can set the `stream` parameter to `true` in your request. The model will then stream the response to the client in chunks, rather than returning the entire response at once.

### Examples <a href="#examples" id="examples"></a>

Here is an example of how to stream a response, and process it:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

question = "How would you build the tallest building ever?"

url = "https://llm.onerouter.pro/v1/chat/completions"
headers = {
  "Authorization": f"Bearer {{API_KEY}}",
  "Content-Type": "application/json"
}

payload = {
  "model": "google/gemini-2.5-flash",
  "messages": [{"role": "user", "content": question}],
  "stream": True
}

buffer = ""
with requests.post(url, headers=headers, json=payload, stream=True) as r:
  for chunk in r.iter_content(chunk_size=1024, decode_unicode=True):
    buffer += chunk
    while True:
      try:
        # Find the next complete SSE line
        line_end = buffer.find('\n')
        if line_end == -1:
          break

        line = buffer[:line_end].strip()
        buffer = buffer[line_end + 1:]

        if line.startswith('data: '):
          data = line[6:]
          if data == '[DONE]':
            break

          try:
            data_obj = json.loads(data)
            content = data_obj["choices"][0]["delta"].get("content")
            if content:
              print(content, end="", flush=True)
          except json.JSONDecodeError:
            pass
      except Exception:
        break
```

{% endtab %}

{% tab title="Typescript" %}

{% endtab %}

{% tab title="cURL" %}

```bash

curl -sN \
  -H "Authorization: Bearer YOUR-API-KEY" \
  -H "Content-Type: application/json" \
  -X POST "https://llm.onerouter.pro/v1/chat/completions" \
  -d '{
    "model": "google/gemini-2.5-flash",
    "messages": [{"role": "user", "content": "How would you build the tallest building ever?"}],
    "stream": true
  }' | while IFS= read -r line; do
      if [[ "$line" == data:* ]]; then
          data="${line#data: }"

          if [[ "$data" == "[DONE]" ]]; then
              break
          fi

          content=$(echo "$data" | jq -r '.choices[0].delta.content // empty' 2>/dev/null)
          if [[ -n "$content" ]]; then
              printf "%s" "$content"
          fi
      fi
  done
```

{% endtab %}
{% endtabs %}

### Additional Information <a href="#additional-information" id="additional-information"></a>

For SSE (Server-Sent Events) streams, Infron occasionally sends `comments` to prevent connection timeouts. These comments look like:

```
INFRONAI PROCESSING
```

Comment payload can be safely ignored per the [SSE specs](https://html.spec.whatwg.org/multipage/server-sent-events.html#event-stream-interpretation). However, you can leverage it to improve UX as needed, e.g. by showing a dynamic loading indicator.

Some SSE client implementations might not parse the payload according to spec, which leads to an uncaught error when you `JSON.stringify` the non-JSON payloads. We recommend the following clients:

* [eventsource-parser](https://github.com/rexxars/eventsource-parser)
* [OpenAI SDK](https://www.npmjs.com/package/openai)
* [Vercel AI SDK](https://www.npmjs.com/package/ai)


# Authentication

API Authentication

Infron authenticates requests using Bearer tokens, which means you can use `curl`, `Python clients`, or the OpenAI SDK directly with Infron.

### Using an API key <a href="#using-an-api-key" id="using-an-api-key"></a>

To use an API key, [first create your key](https://infron.ai/dashboard/apiKeys). Give it a name.

If you're calling the Infron AI API directly, set the `Authorization` header to a Bearer token with your API key.

If you're using the OpenAI Typescript SDK, set the `api_base` to `https://llm.onerouter.pro/v1` and the `apiKey` to your API key.

{% tabs %}
{% tab title="TypeScript (Bearer Token)" %}

```typescript
fetch('https://llm.onerouter.pro/v1/chat/completions', {
  method: 'POST',
  headers: {
    Authorization: 'Bearer <API_KEY>',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'openai/gpt-4o',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  }),
});
```

{% endtab %}

{% tab title="TypeScript (OpenAI SDK)" %}

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>'
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'openai/gpt-4o',
    messages: [{ role: 'user', content: 'Say this is a test' }],
  });

  console.log(completion.choices[0].message);
}

main();
```

{% endtab %}

{% tab title="Python" %}

```python
import openai

openai.api_base = "https://llm.onerouter.pro/v1"
openai.api_key = "<API_KEY>"

response = openai.ChatCompletion.create(
  model="openai/gpt-4o",
  messages=[...]
)

reply = response.choices[0].message
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
  "model": "openai/gpt-4o",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ]
}'
```

{% endtab %}
{% endtabs %}

### If your key has been exposed <a href="#if-your-key-has-been-exposed" id="if-your-key-has-been-exposed"></a>

You must protect your API keys and never commit them to public repositories.

Infron is a GitHub secret scanning partner, and has other methods to detect exposed keys. If we determine that your key has been compromised, you will receive an email notification.

If you receive such a notification or suspect your key has been exposed, immediately visit [your key settings page](https://infron.ai/dashboard/apiKeys) to delete the compromised key and create a new one.

Using environment variables and keeping keys out of your codebase is strongly recommended.


# Errors code

API Errors

```json
{
    "error": {
        "message": "",
        "type": "",
        "param": "",
        "code": 422
    }
}
```

For errors, Infron returns a JSON response with the following shape:

```typescript
type ErrorResponse = {
  error: {
    code: number;
    message: string;
    type: string;
    param: string;
  };
};
```

The HTTP Response will have the same status code as `error.code`, forming a request error if:

* Your original request is invalid
* Your API key/account is out of credits

Otherwise, the returned HTTP response status will be `200` and any error occurred while the LLM is producing the output will be emitted in the response body or as an SSE data event.

### Error Codes <a href="#error-codes" id="error-codes"></a>

* **400**: Bad Request (invalid or missing params, CORS)
* **401**: Invalid credentials (OAuth session expired, disabled/invalid API key)
* **402**: Your account or API key has insufficient credits. Add more credits and retry the request.
* **403**: Your chosen model requires moderation and your input was flagged
* **408**: Your request timed out
* **429**: You are being rate limited
* **502**: Your chosen model is down or we received an invalid response from it
* **503**: There is no available model provider that meets your routing requirements

### When No Content is Generated <a href="#when-no-content-is-generated" id="when-no-content-is-generated"></a>

Occasionally, the model may not generate any content. This typically occurs when:

* The model is warming up from a cold start
* The system is scaling up to handle more requests

Warm-up times usually range from a few seconds to a few minutes, depending on the model and provider.

If you encounter persistent no-content issues, consider implementing a simple retry mechanism or trying again with a different provider or model that has more recent activity.


# Overview

**Infron provides OpenAI-compatible API endpoints**, letting you use multiple AI providers through a familiar interface. You can use existing OpenAI client libraries, switch to the Infron with a URL change, and keep your current tools and workflows without code rewrites.

The OpenAI-compatible API implements the same specification as the [OpenAI API](https://platform.openai.com/docs/api-reference/chat).

### Base URL

The OpenAI-compatible API is available at the following base URL:

`https://llm.onerouter.pro/v1`

### Authentication

The OpenAI-compatible API supports the same authentication methods:

* **API key**: Use your Infron API key with the `Authorization: Bearer <token>` header

### Integration with existing tools

You can use the Infron's OpenAI-compatible API with existing tools and libraries like the [OpenAI client libraries](https://platform.openai.com/docs/libraries). Point your existing client to the Infron's base URL and use your Infron API key for authentication.

#### OpenAI client libraries

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
  base_url="https://llm.onerouter.pro/v1",
  api_key="<API_KEY>",
)

completion = client.chat.completions.create(
  model="claude-3-5-sonnet@20240620",
  messages=[
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ]
)

print(completion.choices[0].message.content)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'https://llm.onerouter.pro/v1',
  apiKey: '<API_KEY>',
});

async function main() {
  const completion = await openai.chat.completions.create({
    model: 'claude-3-5-sonnet@20240620',
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });

  console.log(completion.choices[0].message);
}

main();
```

{% endtab %}
{% endtabs %}

### Error handling

The API returns standard HTTP status codes and error responses:

#### Common error codes

* **400**: Bad Request (invalid or missing params, CORS)
* **401**: Invalid credentials (OAuth session expired, disabled/invalid API key)
* **402**: Your account or API key has insufficient credits. Add more credits and retry the request.
* **403**: Your chosen model requires moderation and your input was flagged
* **408**: Your request timed out
* **429**: You are being rate limited
* **502**: Your chosen model is down or we received an invalid response from it
* **503**: There is no available model provider that meets your routing requirements

#### Error response format

```json
{
    "error": {
        "message": "",
        "type": "",
        "param": "",
        "code": 429
    }
}
```


# Create a chat completion

## Create a chat completion

> Sends a request for a model response for the given chat conversation. Supports both streaming and non-streaming modes.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Create a chat completion"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Create a chat completion","deprecated":false,"description":"Sends a request for a model response for the given chat conversation. Supports both streaming and non-streaming modes.","tags":["Create a chat completion"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":["array","string","null"],"items":{"type":"object","properties":{"type":{"type":"string"},"text":{"type":"string"},"cache_control":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"image_url":{"type":"object","properties":{"url":{"type":"string"},"detail":{"type":"string"}},"required":["url","detail"]}},"required":["type"]}},"tool_calls":{"type":"array","items":{"type":"object","properties":{"id":{"type":"string"},"type":{"type":"string"},"function":{"type":"object","properties":{"name":{"type":"string"},"arguments":{"type":"string"}},"required":["name","arguments"]}}}},"tool_call_id":{"type":"string"}},"required":["role","content"]}},"provider":{"type":"object","properties":{"order":{"type":"array","items":{"type":"string"}},"allow_fallbacks":{"type":"boolean","default":true},"require_parameters":{"type":"boolean","default":true},"data_collection":{"type":"string","default":"allow","enum":["allow","deny"]},"zdr":{"type":"boolean","default":false},"enforce_distillable_text":{"type":"boolean","default":false},"only":{"type":"array","items":{"type":"string"}},"ignore":{"type":"array","items":{"type":"string"}},"quantizations":{"type":"array","items":{"type":"string","enum":["fp16","fp8","int8","int4","none"]}},"sort":{"type":"string","enum":["price","throughput","latency"]},"preferred_min_throughput":{"type":"object","properties":{"p50":{"type":"integer"},"p75":{"type":"integer"},"p90":{"type":"integer"},"p99":{"type":"integer"}},"required":["p50","p75","p90","p99"]},"preferred_max_latency":{"type":"object","properties":{"p50":{"type":"integer"},"p75":{"type":"number"},"p90":{"type":"integer"},"p99":{"type":"integer"}},"required":["p50","p75","p90","p99"]}}},"tools":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"function":{"type":"object","properties":{"name":{"type":"string"},"description":{"type":"string"},"parameters":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"query":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"max_results":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"},"default":{"type":"integer"}},"required":["type","description","default"]},"location":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"units":{"type":"object","properties":{"type":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}},"description":{"type":"string"},"default":{"type":"string"}},"required":["type","enum","description","default"]},"include_forecast":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"},"default":{"type":"boolean"}},"required":["type","description","default"]},"expression":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]}},"required":["expression"]},"required":{"type":"array","items":{"type":"string"}},"additionalProperties":{"type":"boolean"}},"required":["type","properties","required","additionalProperties"]}},"required":["name","description","parameters"]}},"required":["type","function"]}},"tool_choice":{"type":"object","properties":{"type":{"type":"string"},"function":{"type":"object","properties":{"name":{"type":"string"}},"required":["name"]}},"required":["type","function"]},"parallel_tool_calls":{"type":"boolean"},"response_format":{"type":"object","properties":{"type":{"type":"string"},"json_schema":{"type":"object","properties":{"name":{"type":"string"},"strict":{"type":"boolean"},"schema":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"summary":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"details":{"type":"object","properties":{"type":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"description":{"type":"string"}},"required":["type","items","description"]},"confidence":{"type":"object","properties":{"type":{"type":"string"},"minimum":{"type":"integer"},"maximum":{"type":"integer"},"description":{"type":"string"}},"required":["type","minimum","maximum","description"]},"sources":{"type":"object","properties":{"type":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"title":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"url":{"type":"object","properties":{"type":{"type":"string"},"format":{"type":"string"}},"required":["type","format"]}},"required":["title","url"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","properties","required"]}},"required":["type","items"]}},"required":["summary","details","confidence","sources"]},"required":{"type":"array","items":{"type":"string"}},"additionalProperties":{"type":"boolean"}},"required":["type","properties","required","additionalProperties"]}},"required":["name","strict","schema"]}},"required":["type","json_schema"]},"reasoning":{"type":"object","properties":{"effort":{"type":"string","enum":["xhigh","high","medium","low","minimal","none"]},"max_tokens":{"type":"integer"}}},"temperature":{"type":"number"},"top_p":{"type":"number"},"frequency_penalty":{"type":"number"},"presence_penalty":{"type":"number"},"max_completion_tokens":{"type":"integer"},"max_tokens":{"type":"integer"},"seed":{"type":"integer"},"stop":{"type":"array","items":{"type":"string"}},"logprobs":{"type":"boolean"},"top_logprobs":{"type":"integer"},"logit_bias":{"type":"object","properties":{"198":{"type":"integer"},"1234":{"type":"integer"},"5678":{"type":"integer"},"50256":{"type":"integer"}},"required":["198","1234","5678","50256"]},"stream":{"type":"boolean","default":false},"stream_options":{"type":"object","properties":{"include_usage":{"type":"boolean"}},"required":["include_usage"]},"usage":{"type":"object","properties":{"include":{"type":"boolean","default":false}}}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"object","properties":{"finish_reason":{"type":"string"},"index":{"type":"integer"},"logprobs":{"type":"null"},"message":{"type":"object","properties":{"content":{"type":"string"},"role":{"type":"string"},"thought_signature":{"type":"string"},"tool_calls":{"type":"array","items":{"type":"object","properties":{"function":{"type":"object","properties":{"arguments":{"type":"string"},"name":{"type":"string"}},"required":["arguments","name"]},"id":{"type":"string"},"index":{"type":"integer"},"thought_signature":{"type":"string"},"type":{"type":"string"}}}}},"required":["content","role","thought_signature","tool_calls"]}}}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"completion_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"reasoning_tokens":{"type":"integer"}},"required":["audio_tokens","image_tokens","reasoning_tokens"]},"prompt_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"cached_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["audio_tokens","cache_write_tokens","cached_tokens","video_tokens"]},"total_tokens":{"type":"integer"}},"required":["completion_tokens","completion_tokens_details","prompt_tokens","prompt_tokens_details","total_tokens"]}},"required":["choices","cost","cost_details","created","id","model","object","request_id","usage"]}}},"headers":{}}}}}}}
```


# Chat with Images Inputs

## POST /v1/chat/completions

> Chat with Images Inputs

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with Images Inputs"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Chat with Images Inputs","deprecated":false,"description":"","tags":["Chat with Images Inputs"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"text":{"type":"string"},"image_url":{"type":"object","properties":{"url":{"type":"string"},"detail":{"type":"string"}},"required":["url","detail"]}},"required":["type"]}}},"required":["role","content"]}}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"object","properties":{"finish_reason":{"type":"string"},"index":{"type":"integer"},"logprobs":{"type":"null"},"message":{"type":"object","properties":{"content":{"type":"string"},"role":{"type":"string"},"thought_signature":{"type":"string"}},"required":["content","role","thought_signature"]}}}},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"completion_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"reasoning_tokens":{"type":"integer"}},"required":["audio_tokens","image_tokens","reasoning_tokens"]},"prompt_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"cached_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["audio_tokens","cache_write_tokens","cached_tokens","video_tokens"]},"total_tokens":{"type":"integer"}},"required":["completion_tokens","completion_tokens_details","prompt_tokens","prompt_tokens_details","total_tokens"]}},"required":["choices","created","id","model","object","request_id","usage"]}}},"headers":{}}}}}}}
```


# Chat with PDF Inputs

## POST /v1/chat/completions

> Chat with PDF Inputs

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with PDF Inputs"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Chat with PDF Inputs","deprecated":false,"description":"","tags":["Chat with PDF Inputs"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"text":{"type":"string"},"file":{"type":"object","properties":{"filename":{"type":"string"},"file_data":{"type":"string"}},"required":["filename","file_data"]}},"required":["type"]}}},"required":["role","content"]}}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"created":{"type":"integer"},"choices":{"type":"array","items":{"type":"object","properties":{"index":{"type":"integer"},"message":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}},"required":["role","content"]},"finish_reason":{"type":"string"},"logprobs":{"type":"null"}}}},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["cached_tokens","cache_write_tokens","audio_tokens","video_tokens"]},"completion_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"}},"required":["reasoning_tokens","image_tokens","audio_tokens"]}},"required":["prompt_tokens","completion_tokens","total_tokens","prompt_tokens_details","completion_tokens_details"]}},"required":["id","model","object","created","choices","request_id","usage"]}}},"headers":{}}}}}}}
```


# Chat with Tool Calling

## Chat with Tool Calling

> The Infron supports OpenAI-compatible function calling, allowing models to call tools and functions. This follows the same specification as the OpenAI Function Calling API.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with Tool Calling"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Chat with Tool Calling","deprecated":false,"description":"The Infron supports OpenAI-compatible function calling, allowing models to call tools and functions. This follows the same specification as the OpenAI Function Calling API.","tags":["Chat with Tool Calling"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"tools":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"function":{"type":"object","properties":{"name":{"type":"string"},"description":{"type":"string"},"parameters":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"unit":{"type":"object","properties":{"type":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}},"description":{"type":"string"}},"required":["type","enum","description"]}},"required":["location","unit"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","properties","required"]}},"required":["name","description","parameters"]}}}},"tool_choice":{"type":"string","enum":["none","auto","required"]},"stream":{"type":"boolean"}},"required":["model","messages","tools"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"created":{"type":"integer"},"choices":{"type":"array","items":{"type":"object","properties":{"index":{"type":"integer"},"message":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"null"},"tool_calls":{"type":"array","items":{"type":"object","properties":{"id":{"type":"string"},"type":{"type":"string"},"function":{"type":"object","properties":{"name":{"type":"string"},"arguments":{"type":"string"}},"required":["name","arguments"]}}}}},"required":["role","content","tool_calls"]},"finish_reason":{"type":"string"},"logprobs":{"type":"null"}}}},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["cached_tokens","cache_write_tokens","audio_tokens","video_tokens"]},"completion_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"}},"required":["reasoning_tokens","image_tokens","audio_tokens"]}},"required":["prompt_tokens","completion_tokens","total_tokens","prompt_tokens_details","completion_tokens_details"]}},"required":["id","model","object","created","choices","request_id","usage"]}}},"headers":{}}}}}}}
```


# Chat with Structured Outputs

## Chat with Structured Outputs

> Generate structured JSON responses that conform to a specific schema, ensuring predictable and reliable data formats for your applications.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with Structured Outputs"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Chat with Structured Outputs","deprecated":false,"description":"Generate structured JSON responses that conform to a specific schema, ensuring predictable and reliable data formats for your applications.","tags":["Chat with Structured Outputs"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"response_format":{"type":"object","properties":{"type":{"type":"string"},"json_schema":{"type":"object","properties":{"name":{"type":"string"},"strict":{"type":"boolean"},"schema":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"temperature":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"conditions":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]}},"required":["location","temperature","conditions"]},"required":{"type":"array","items":{"type":"string"}},"additionalProperties":{"type":"boolean"}},"required":["type","properties","required","additionalProperties"]}},"required":["name","strict","schema"]}},"required":["type","json_schema"]}},"required":["model","messages","response_format"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"created":{"type":"integer"},"choices":{"type":"array","items":{"type":"object","properties":{"index":{"type":"integer"},"message":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}},"required":["role","content"]},"finish_reason":{"type":"string"},"logprobs":{"type":"null"}}}},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["cached_tokens","cache_write_tokens","audio_tokens","video_tokens"]},"completion_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"}},"required":["reasoning_tokens","image_tokens","audio_tokens"]}},"required":["prompt_tokens","completion_tokens","total_tokens","prompt_tokens_details","completion_tokens_details"]}},"required":["id","model","object","created","choices","request_id","usage"]}}},"headers":{}}}}}}}
```


# Reasoning configuration

## POST /v1/chat/completions

> Reasoning configuration

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Reasoning configuration"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Reasoning configuration","deprecated":false,"description":"","tags":["Reasoning configuration"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"reasoning":{"type":"object","properties":{"effort":{"type":"string","enum":["xhigh","high","medium","low","minimal","none"]},"max_tokens":{"type":"integer"}}}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"created":{"type":"integer"},"choices":{"type":"array","items":{"type":"object","properties":{"index":{"type":"integer"},"message":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}},"required":["role","content"]},"finish_reason":{"type":"string"},"logprobs":{"type":"null"}}}},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["cached_tokens","cache_write_tokens","audio_tokens","video_tokens"]},"completion_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"}},"required":["reasoning_tokens","image_tokens","audio_tokens"]}},"required":["prompt_tokens","completion_tokens","total_tokens","prompt_tokens_details","completion_tokens_details"]}},"required":["id","model","object","created","choices","request_id","usage"]}}},"headers":{}}}}}}}
```


# Explicit Caching

## POST /v1/chat/completions

> Explicit Caching

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Explicit Caching"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Explicit Caching","deprecated":false,"description":"","tags":["Explicit Caching"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"text":{"type":"string"},"cache_control":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["type","text"]}}}}}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"object","properties":{"finish_reason":{"type":"string"},"index":{"type":"integer"},"logprobs":{"type":"null"},"message":{"type":"object","properties":{"content":{"type":"string"},"role":{"type":"string"}},"required":["content","role"]}}}},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"completion_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"reasoning_tokens":{"type":"integer"}},"required":["audio_tokens","image_tokens","reasoning_tokens"]},"prompt_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"cached_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["audio_tokens","cache_write_tokens","cached_tokens","video_tokens"]},"total_tokens":{"type":"integer"}},"required":["completion_tokens","completion_tokens_details","prompt_tokens","prompt_tokens_details","total_tokens"]}},"required":["choices","created","id","model","object","request_id","usage"]}}},"headers":{}}}}}}}
```


# Chat with OpenAI Compatible Web Search

## POST /v1/chat/completions

> Chat with OpenAI-compatible Web Search

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with OpenAI-compatible Web Search"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Chat with OpenAI-compatible Web Search","deprecated":false,"description":"","tags":["Chat with OpenAI-compatible Web Search"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"web_search_options":{"type":"object","properties":{"search_context_size":{"type":"string"},"user_location":{"type":"object","properties":{"approximate":{"type":"object","properties":{"timezone":{"type":"string"},"country":{"type":"string"},"city":{"type":"string"}},"required":["timezone","country","city"]}},"required":["approximate"]}},"required":["search_context_size","user_location"]}},"required":["model","messages","web_search_options"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"created":{"type":"integer"},"choices":{"type":"array","items":{"type":"object","properties":{"index":{"type":"integer"},"message":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"},"annotations":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"url_citation":{"type":"object","properties":{"url":{"type":"string"},"start_index":{"type":"integer"},"end_index":{"type":"integer"},"title":{"type":"string"}},"required":["url","start_index","end_index","title"]}},"required":["type","url_citation"]}}},"required":["role","content","annotations"]},"finish_reason":{"type":"string"},"logprobs":{"type":"null"}}}},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["cached_tokens","cache_write_tokens","audio_tokens","video_tokens"]},"completion_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"audio_tokens":{"type":"integer"}},"required":["reasoning_tokens","image_tokens","audio_tokens"]}},"required":["prompt_tokens","completion_tokens","total_tokens","prompt_tokens_details","completion_tokens_details"]}},"required":["id","model","object","created","choices","request_id","usage"]}}},"headers":{}}}}}}}
```


# Chat with OpenAI Compatible Web Fetch

## Chat with OpenAI-compatible Web Fetch

> Fetch and read content from specific URLs to augment AI Model's context with live web content.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with OpenAI-compatible Web Fetch"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Chat with OpenAI-compatible Web Fetch","deprecated":false,"description":"Fetch and read content from specific URLs to augment AI Model's context with live web content.","tags":["Chat with OpenAI-compatible Web Fetch"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"provider":{"type":"object","properties":{"order":{"type":"array","items":{"type":"string"}},"allow_fallbacks":{"type":"boolean"}},"required":["order","allow_fallbacks"]},"max_tokens":{"type":"integer"},"web_fetch_options":{"type":"object","properties":{"max_uses":{"type":"integer"},"citations":{"type":"object","properties":{"enabled":{"type":"boolean"}},"required":["enabled"]},"max_content_tokens":{"type":"integer"},"allowed_domains":{"type":"array","items":{"type":"string"}},"blocked_domains":{"type":"array","items":{"type":"string"}}},"required":["max_uses","citations","max_content_tokens"]},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]}},"required":["model","messages","web_fetch_options"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"object","properties":{"finish_reason":{"type":"string"},"index":{"type":"integer"},"logprobs":{"type":"null"},"message":{"type":"object","properties":{"annotations":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"url_citation":{"type":"object","properties":{"end_index":{"type":"integer"},"start_index":{"type":"integer"},"title":{"type":"string"},"url":{"type":"string"}},"required":["end_index","start_index","title","url"]}},"required":["type","url_citation"]}},"content":{"type":"string"},"role":{"type":"string"}},"required":["annotations","content","role"]}}}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"provider":{"type":"string"},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"completion_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"reasoning_tokens":{"type":"integer"}},"required":["audio_tokens","image_tokens","reasoning_tokens"]},"prompt_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"cached_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["audio_tokens","cache_write_tokens","cached_tokens","video_tokens"]},"total_tokens":{"type":"integer"}},"required":["completion_tokens","completion_tokens_details","prompt_tokens","prompt_tokens_details","total_tokens"]}},"required":["choices","cost","cost_details","created","id","model","object","provider","request_id","usage"]}}},"headers":{}}}}}}}
```


# Overview

**Infron provides Anthropic-compatible API endpoints**, so you can use the Anthropic SDK and tools like [Claude Code](https://www.claude.com/product/claude-code) through a unified gateway with only a URL change.

The Anthropic-compatible API implements the same specification as the [Anthropic Messages API](https://docs.anthropic.com/en/api/messages).

For more on using Infron with Claude Code, see the [Claude Code instructions](https://vercel.com/docs/agent-resources/coding-agents/claude-code).

### Base URL

The Anthropic-compatible API is available at the following base URL:

`https://llm.onerouter.pro`

### Authentication

The OpenAI-compatible API supports the same authentication methods:

* **API key**: Use your Infron API key with the `Authorization: Bearer <token>` header

### Integration with existing tools <a href="#integration-with-existing-tools" id="integration-with-existing-tools"></a>

You can use the Infron's **Anthropic-compatible API** with existing tools and libraries like the **Anthropic Claude** **client libraries**. Point your existing client to the Infron's base URL and use your Infron API key for authentication.

### Integration with Anthropic SDK

{% tabs %}
{% tab title="Python(Anthropic SDK) " %}

```python
import anthropic

client = anthropic.Anthropic(
    base_url="https://llm.onerouter.pro",
    api_key="<<Your API Key>>"
)

message = client.messages.create(
    model="anthropic/claude-sonnet-4.5",
    max_tokens=1000,
    temperature=1,
    system=[
        {
            "type": "text",
            "text": "You are a world-class poet. Respond only with short poems."
        }
    ],
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Why is the ocean salty?"
                }
            ]
        }
    ]
)

print(message.content)
```

{% endtab %}
{% endtabs %}

{% content-ref url="/spaces/Z9C9AjT7j46HAcQrOVWw/pages/N3uSC9lHaCdJI0fhO5Fj" %}
[Anthropic SDK Compatibility](/docs/frameworks-and-integrations/anthropic-sdk-compatibility)
{% endcontent-ref %}

#### Configuring Claude Code

{% content-ref url="/spaces/Z9C9AjT7j46HAcQrOVWw/pages/sS8UsILwwl0D4nc7Dly8" %}
[Claude Code Integration Guide](/docs/frameworks-and-integrations/claude-code-integration-guide)
{% endcontent-ref %}

### Error handling

The API returns standard HTTP status codes and error responses:

#### Common error codes

* **400**: Bad Request (invalid or missing params, CORS)
* **401**: Invalid credentials (OAuth session expired, disabled/invalid API key)
* **402**: Your account or API key has insufficient credits. Add more credits and retry the request.
* **403**: Your chosen model requires moderation and your input was flagged
* **408**: Your request timed out
* **429**: You are being rate limited
* **502**: Your chosen model is down or we received an invalid response from it
* **503**: There is no available model provider that meets your routing requirements

#### Error response format

```json
{
    "error": {
        "message": "",
        "type": "",
        "param": "",
        "code": 429
    }
}
```


# Create a message

## POST /v1/messages

> Create a message

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Create a message"}],"servers":[{"url":"https://llm.onerouter.pro","description":"正式环境"}],"security":[],"paths":{"/v1/messages":{"post":{"summary":"Create a message","deprecated":false,"description":"","tags":["Create a message"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"max_tokens":{"type":"integer"},"system":{"type":"string"},"metadata":{"type":"object","properties":{"user_id":{"type":"string"}},"required":["user_id"]},"stop_sequences":{"type":"array","items":{"type":"string"}},"temperature":{"type":"number"},"top_p":{"type":"number"},"top_k":{"type":"integer"},"tools":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"description":{"type":"string"},"input_schema":{"type":"object","properties":{"type":{"type":"string"},"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"units":{"type":"object","properties":{"type":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}},"description":{"type":"string"}},"required":["type","enum","description"]},"days":{"type":"object","properties":{"type":{"type":"string"},"minimum":{"type":"integer"},"maximum":{"type":"integer"},"description":{"type":"string"}},"required":["type","minimum","maximum","description"]},"customer_id":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"include_orders":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"include_risk_flags":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]}},"required":["customer_id","include_orders","include_risk_flags"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","additionalProperties","properties","required"]}},"required":["name","description","input_schema"]}},"tool_choice":{"type":"object","properties":{"type":{"type":"string"},"disable_parallel_tool_use":{"type":"boolean"}},"required":["type","disable_parallel_tool_use"]},"thinking":{"type":"object","properties":{"type":{"type":"string"},"budget_tokens":{"type":"integer"}},"required":["type","budget_tokens"]},"service_tier":{"type":"string"},"output_config":{"type":"object","properties":{"effort":{"type":"string"},"format":{"type":"object","properties":{"type":{"type":"string"},"schema":{"type":"object","properties":{"type":{"type":"string"},"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"task_id":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"language":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"summary":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"entities":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"},"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"name":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"type":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"confidence":{"type":"object","properties":{"type":{"type":"string"},"minimum":{"type":"integer"},"maximum":{"type":"integer"}},"required":["type","minimum","maximum"]}},"required":["name","type","confidence"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","additionalProperties","properties","required"]}},"required":["type","description","items"]},"facts":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"},"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"field":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"value":{"type":"object","properties":{"type":{"type":"array","items":{"type":"string"}}},"required":["type"]},"source_quote":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["field","value","source_quote"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","additionalProperties","properties","required"]}},"required":["type","description","items"]},"citations":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"},"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"source_id":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"title":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"url":{"type":"object","properties":{"type":{"type":"array","items":{"type":"string"}}},"required":["type"]}},"required":["source_id","title","url"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","additionalProperties","properties","required"]}},"required":["type","description","items"]},"tool_calls_used":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["type","description","items"]},"confidence_overall":{"type":"object","properties":{"type":{"type":"string"},"minimum":{"type":"integer"},"maximum":{"type":"integer"}},"required":["type","minimum","maximum"]}},"required":["task_id","language","summary","entities","facts","citations","tool_calls_used","confidence_overall"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","additionalProperties","properties","required"]}},"required":["type","schema"]}},"required":["effort","format"]},"cache_control":{"type":"object","properties":{"type":{"type":"string"},"ttl":{"type":"string"}},"required":["type","ttl"]},"stream":{"type":"boolean"},"stream_options":{"type":"object","properties":{"include_usage":{"type":"boolean"}},"required":["include_usage"]},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]},"context_management":{"type":"null"},"provider":{"type":"object","properties":{"order":{"type":"array","items":{"type":"string"}},"allow_fallbacks":{"type":"boolean"},"require_parameters":{"type":"boolean"},"data_collection":{"type":"string"},"zdr":{"type":"boolean"},"enforce_distillable_text":{"type":"boolean"},"only":{"type":"array","items":{"type":"string"}},"ignore":{"type":"array","items":{"type":"string"}},"quantizations":{"type":"array","items":{"type":"string"}},"sort":{"type":"string"},"preferred_min_throughput":{"type":"object","properties":{"p50":{"type":"integer"},"p75":{"type":"integer"},"p90":{"type":"integer"},"p99":{"type":"integer"}},"required":["p50","p75","p90","p99"]},"preferred_max_latency":{"type":"object","properties":{"p50":{"type":"integer"},"p75":{"type":"integer"},"p90":{"type":"integer"},"p99":{"type":"integer"}},"required":["p50","p75","p90","p99"]}},"required":["order","allow_fallbacks","require_parameters","data_collection","zdr","enforce_distillable_text","only","ignore","quantizations","sort","preferred_min_throughput","preferred_max_latency"]}},"required":["model","messages","max_tokens","system","metadata","stop_sequences","temperature","top_p","top_k","tools","tool_choice","thinking","service_tier","output_config","cache_control","stream","stream_options","usage","context_management","provider"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"content":{"type":"array","items":{"type":"object","properties":{"text":{"type":"string"},"type":{"type":"string"}}}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"id":{"type":"string"},"model":{"type":"string"},"role":{"type":"string"},"stop_reason":{"type":"string"},"type":{"type":"string"},"usage":{"type":"object","properties":{"cache_creation_input_tokens":{"type":"integer"},"cache_read_input_tokens":{"type":"integer"},"input_tokens":{"type":"integer"},"output_tokens":{"type":"integer"}},"required":["cache_creation_input_tokens","cache_read_input_tokens","input_tokens","output_tokens"]}},"required":["content","cost","cost_details","id","model","role","stop_reason","type","usage"]}}},"headers":{}}}}}}}
```


# Chat with Tool Calling

## POST /v1/messages

> Chat with Tool Calling

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with Tool Calling"}],"servers":[{"url":"https://llm.onerouter.pro","description":"正式环境"}],"security":[],"paths":{"/v1/messages":{"post":{"summary":"Chat with Tool Calling","deprecated":false,"description":"","tags":["Chat with Tool Calling"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"tools":{"type":"array","items":{"type":"object","properties":{"name":{"type":"string"},"description":{"type":"string"},"input_schema":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"unit":{"type":"object","properties":{"type":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}},"description":{"type":"string"}},"required":["type","enum","description"]}},"required":["location","unit"]},"required":{"type":"array","items":{"type":"string"}}},"required":["type","properties","required"]}}}},"tool_choice":{"type":"object","properties":{"type":{"type":"string"},"disable_parallel_tool_use":{"type":"boolean"}},"required":["type","disable_parallel_tool_use"]},"stream":{"type":"boolean"},"stream_options":{"type":"object","properties":{"include_usage":{"type":"boolean"}},"required":["include_usage"]},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]}},"required":["model","messages","tools","tool_choice","stream","stream_options","usage"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"content":{"type":"array","items":{"type":"object","properties":{"text":{"type":"string"},"type":{"type":"string"},"id":{"type":"string"},"input":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]},"name":{"type":"string"}},"required":["type"]}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"id":{"type":"string"},"model":{"type":"string"},"role":{"type":"string"},"stop_reason":{"type":"string"},"stop_sequence":{"type":"null"},"type":{"type":"string"},"usage":{"type":"object","properties":{"cache_creation":{"type":"object","properties":{"ephemeral_1h_input_tokens":{"type":"integer"},"ephemeral_5m_input_tokens":{"type":"integer"}},"required":["ephemeral_1h_input_tokens","ephemeral_5m_input_tokens"]},"cache_creation_input_tokens":{"type":"integer"},"cache_read_input_tokens":{"type":"integer"},"input_tokens":{"type":"integer"},"output_tokens":{"type":"integer"}},"required":["cache_creation","cache_creation_input_tokens","cache_read_input_tokens","input_tokens","output_tokens"]}},"required":["content","cost","cost_details","id","model","role","stop_reason","stop_sequence","type","usage"]}}},"headers":{}}}}}}}
```


# Reasoning configuration

## POST /v1/messages

> Reasoning configuration

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Reasoning configuration"}],"servers":[{"url":"https://llm.onerouter.pro","description":"正式环境"}],"security":[],"paths":{"/v1/messages":{"post":{"summary":"Reasoning configuration","deprecated":false,"description":"","tags":["Reasoning configuration"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"thinking":{"type":"object","properties":{"type":{"type":"string"},"budget_tokens":{"type":"integer"}},"required":["type","budget_tokens"]},"stream":{"type":"boolean"},"stream_options":{"type":"object","properties":{"include_usage":{"type":"boolean"}},"required":["include_usage"]},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]}},"required":["model","messages","thinking","stream","stream_options","usage"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"content":{"type":"array","items":{"type":"object","properties":{"signature":{"type":"string"},"thinking":{"type":"string"},"type":{"type":"string"},"text":{"type":"string"}},"required":["type"]}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"id":{"type":"string"},"model":{"type":"string"},"role":{"type":"string"},"stop_reason":{"type":"string"},"stop_sequence":{"type":"null"},"type":{"type":"string"},"usage":{"type":"object","properties":{"cache_creation":{"type":"object","properties":{"ephemeral_1h_input_tokens":{"type":"integer"},"ephemeral_5m_input_tokens":{"type":"integer"}},"required":["ephemeral_1h_input_tokens","ephemeral_5m_input_tokens"]},"cache_creation_input_tokens":{"type":"integer"},"cache_read_input_tokens":{"type":"integer"},"input_tokens":{"type":"integer"},"output_tokens":{"type":"integer"}},"required":["cache_creation","cache_creation_input_tokens","cache_read_input_tokens","input_tokens","output_tokens"]}},"required":["content","cost","cost_details","id","model","role","stop_reason","stop_sequence","type","usage"]}}},"headers":{}}}}}}}
```


# Chat with Web Search

## Chat with Web Search

> Use the built-in web search tool to give the model access to current information from the web.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with Web Search"}],"servers":[{"url":"https://llm.onerouter.pro","description":"正式环境"}],"security":[],"paths":{"/v1/messages":{"post":{"summary":"Chat with Web Search","deprecated":false,"description":"Use the built-in web search tool to give the model access to current information from the web.","tags":["Chat with Web Search"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"tools":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"name":{"type":"string"}}}},"stream":{"type":"boolean"},"stream_options":{"type":"object","properties":{"include_usage":{"type":"boolean"}},"required":["include_usage"]},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]}},"required":["model","messages","tools","stream","stream_options","usage"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"content":{"type":"array","items":{"type":"object","properties":{"caller":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"id":{"type":"string"},"input":{"type":"object","properties":{"query":{"type":"string"}},"required":["query"]},"name":{"type":"string"},"type":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"encrypted_content":{"type":"string"},"page_age":{"type":["string","null"]},"title":{"type":"string"},"type":{"type":"string"},"url":{"type":"string"}},"required":["encrypted_content","page_age","title","type","url"]}},"tool_use_id":{"type":"string"},"text":{"type":"string"},"citations":{"type":"array","items":{"type":"object","properties":{"cited_text":{"type":"string"},"encrypted_index":{"type":"string"},"title":{"type":"string"},"type":{"type":"string"},"url":{"type":"string"}},"required":["cited_text","encrypted_index","title","type","url"]}}},"required":["caller","type","text","citations"]}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"id":{"type":"string"},"model":{"type":"string"},"role":{"type":"string"},"stop_reason":{"type":"string"},"stop_sequence":{"type":"null"},"type":{"type":"string"},"usage":{"type":"object","properties":{"cache_creation":{"type":"object","properties":{"ephemeral_1h_input_tokens":{"type":"integer"},"ephemeral_5m_input_tokens":{"type":"integer"}},"required":["ephemeral_1h_input_tokens","ephemeral_5m_input_tokens"]},"cache_creation_input_tokens":{"type":"integer"},"cache_read_input_tokens":{"type":"integer"},"inference_geo":{"type":"string"},"input_tokens":{"type":"integer"},"output_tokens":{"type":"integer"},"server_tool_use":{"type":"object","properties":{"web_fetch_requests":{"type":"integer"},"web_search_requests":{"type":"integer"}},"required":["web_fetch_requests","web_search_requests"]},"service_tier":{"type":"string"}},"required":["cache_creation","cache_creation_input_tokens","cache_read_input_tokens","inference_geo","input_tokens","output_tokens","server_tool_use","service_tier"]}},"required":["content","cost","cost_details","id","model","role","stop_reason","stop_sequence","type","usage"]}}},"headers":{}}}}}}}
```


# Chat with File Attachments

## Chat with File Attachments

> Send images and PDF documents as part of your message request.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with File Attachments"}],"servers":[{"url":"https://llm.onerouter.pro","description":"正式环境"}],"security":[],"paths":{"/v1/messages":{"post":{"summary":"Chat with File Attachments","deprecated":false,"description":"Send images and PDF documents as part of your message request.","tags":["Chat with File Attachments"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"source":{"type":"object","properties":{"type":{"type":"string"},"media_type":{"type":"string"},"data":{"type":"string"}},"required":["type","media_type","data"]},"text":{"type":"string"}},"required":["type","source"]}}}}},"stream":{"type":"boolean"},"stream_options":{"type":"object","properties":{"include_usage":{"type":"boolean"}},"required":["include_usage"]},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]}},"required":["model","messages","stream","stream_options","usage"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"content":{"type":"array","items":{"type":"object","properties":{"text":{"type":"string"},"type":{"type":"string"}}}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"id":{"type":"string"},"model":{"type":"string"},"role":{"type":"string"},"stop_reason":{"type":"string"},"stop_sequence":{"type":"null"},"type":{"type":"string"},"usage":{"type":"object","properties":{"cache_creation":{"type":"object","properties":{"ephemeral_1h_input_tokens":{"type":"integer"},"ephemeral_5m_input_tokens":{"type":"integer"}},"required":["ephemeral_1h_input_tokens","ephemeral_5m_input_tokens"]},"cache_creation_input_tokens":{"type":"integer"},"cache_read_input_tokens":{"type":"integer"},"input_tokens":{"type":"integer"},"output_tokens":{"type":"integer"}},"required":["cache_creation","cache_creation_input_tokens","cache_read_input_tokens","input_tokens","output_tokens"]}},"required":["content","cost","cost_details","id","model","role","stop_reason","stop_sequence","type","usage"]}}},"headers":{}}}}}}}
```


# Overview

**Infron supports the** [**OpenResponses API**](https://openresponses.org/) **specification**, an open standard for AI model interactions. OpenResponses provides a unified interface across providers with built-in support for streaming, tool calling, reasoning, and multi-modal inputs.

### Base URL

The OpenResponses-compatible API is available at:

`https://llm.onerouter.pro/v1`

### Authentication

The OpenAI-compatible API supports the same authentication methods:

* **API key**: Use your Infron API key with the `Authorization: Bearer <token>` header

### Getting started

Here's a simple example to generate a text response:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
    'https://llm.onerouter.pro/v1/responses',
    headers={
        'Authorization': 'Bearer <API_KEY>',
        'Content-Type': 'application/json',
    },
    json={
        'model': 'o4-mini',
        'input': 'Hello, world!',
    }
)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/responses', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer <API_KEY>',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'o4-mini',
    input: 'Hello, world!',
  }),
});
```

{% endtab %}

{% tab title="cURL" %}

```shellscript
curl -X POST https://llm.onerouter.pro/v1/responses \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o4-mini",
    "input": "Hello, world!"
  }'
```

{% endtab %}
{% endtabs %}

### Error handling

The API returns standard HTTP status codes and error responses:

#### Common error codes

* **400**: Bad Request (invalid or missing params, CORS)
* **401**: Invalid credentials (OAuth session expired, disabled/invalid API key)
* **402**: Your account or API key has insufficient credits. Add more credits and retry the request.
* **403**: Your chosen model requires moderation and your input was flagged
* **408**: Your request timed out
* **429**: You are being rate limited
* **502**: Your chosen model is down or we received an invalid response from it
* **503**: There is no available model provider that meets your routing requirements

#### Error response format

```json
{
    "error": {
        "message": "",
        "type": "",
        "param": "",
        "code": 429
    }
}
```


# Create a response

## POST /v1/responses

> Create a response

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Create a response"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/responses":{"post":{"summary":"Create a response","deprecated":false,"description":"","tags":["Create a response"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"input":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"role":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"text":{"type":"string"},"image_url":{"type":"string"},"detail":{"type":"string"},"filename":{"type":"string"},"file_data":{"type":"string"}},"required":["type","text"]}},"call_id":{"type":"string"},"name":{"type":"string"},"arguments":{"type":"string"},"output":{"type":"string"}},"required":["type","role","content","call_id"]}},"instructions":{"type":"string"},"metadata":{"type":"object","properties":{"environment":{"type":"string"},"application":{"type":"string"},"tenant":{"type":"string"},"priority":{"type":"string"},"region":{"type":"string"},"workflow":{"type":"string"},"request_purpose":{"type":"string"},"customer_tier":{"type":"string"},"feature_flag":{"type":"string"},"compliance_mode":{"type":"string"},"trace_label":{"type":"string"},"release_channel":{"type":"string"},"locale":{"type":"string"},"department":{"type":"string"},"scenario":{"type":"string"},"owner":{"type":"string"}},"required":["environment","application","tenant","priority","region","workflow","request_purpose","customer_tier","feature_flag","compliance_mode","trace_label","release_channel","locale","department","scenario","owner"]},"tools":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"name":{"type":"string"},"description":{"type":"string"},"strict":{"type":"boolean"},"parameters":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"},"description":{"type":"string"}},"required":["type","description"]},"unit":{"type":"object","properties":{"type":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}},"description":{"type":"string"}},"required":["type","enum","description"]}},"required":["location","unit"]},"required":{"type":"array","items":{"type":"string"}},"additionalProperties":{"type":"boolean"}},"required":["type","properties","required","additionalProperties"]},"engine":{"type":"string"},"max_results":{"type":"integer"},"user_location":{"type":"object","properties":{"type":{"type":"string"},"city":{"type":"string"},"region":{"type":"string"},"country":{"type":"string"}},"required":["type","city","region","country"]}},"required":["type","engine","max_results"]}},"tool_choice":{"type":"object","properties":{"type":{"type":"string"},"name":{"type":"string"}},"required":["type","name"]},"parallel_tool_calls":{"type":"boolean"},"text":{"type":"object","properties":{"verbosity":{"type":"string"},"format":{"type":"object","properties":{"type":{"type":"string"},"name":{"type":"string"},"strict":{"type":"boolean"},"schema":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"summary":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"key_findings":{"type":"object","properties":{"type":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["type","items"]},"risks":{"type":"object","properties":{"type":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["type","items"]},"actions":{"type":"object","properties":{"type":{"type":"string"},"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["type","items"]},"confidence":{"type":"object","properties":{"type":{"type":"string"},"minimum":{"type":"integer"},"maximum":{"type":"integer"}},"required":["type","minimum","maximum"]}},"required":["summary","key_findings","risks","actions","confidence"]},"required":{"type":"array","items":{"type":"string"}},"additionalProperties":{"type":"boolean"}},"required":["type","properties","required","additionalProperties"]}},"required":["type","name","strict","schema"]}},"required":["verbosity","format"]},"reasoning":{"type":"object","properties":{"effort":{"type":"string"}},"required":["effort"]},"max_output_tokens":{"type":"integer"},"top_logprobs":{"type":"integer"},"max_tool_calls":{"type":"integer"},"presence_penalty":{"type":"number"},"frequency_penalty":{"type":"number"},"top_k":{"type":"integer"},"image_config":{"type":"object","properties":{"size":{"type":"string"},"quality":{"type":"string"},"background":{"type":"string"}},"required":["size","quality","background"]},"modalities":{"type":"array","items":{"type":"string"}},"prompt_cache_key":{"type":"string"},"prompt":{"type":"object","properties":{"id":{"type":"string"},"version":{"type":"string"},"variables":{"type":"object","properties":{"customer_name":{"type":"string"},"document_title":{"type":"string"},"reference_image_data_url":{"type":"string"},"reference_pdf_data_url":{"type":"string"},"analysis_goal":{"type":"string"}},"required":["customer_name","document_title","reference_image_data_url","reference_pdf_data_url","analysis_goal"]}},"required":["id","version","variables"]},"include":{"type":"array","items":{"type":"string"}},"background":{"type":"boolean"},"safety_identifier":{"type":"string"},"store":{"type":"boolean"},"service_tier":{"type":"string"},"truncation":{"type":"string"},"stream":{"type":"boolean"},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]},"provider":{"type":"object","properties":{"allow_fallbacks":{"type":"boolean"},"require_parameters":{"type":"boolean"},"data_collection":{"type":"string"},"zdr":{"type":"boolean"},"enforce_distillable_text":{"type":"boolean"},"order":{"type":"array","items":{"type":"string"}},"only":{"type":"array","items":{"type":"string"}},"ignore":{"type":"array","items":{"type":"string"}},"quantizations":{"type":"array","items":{"type":"string"}},"sort":{"type":"string"},"preferred_min_throughput":{"type":"object","properties":{"p50":{"type":"integer"},"p90":{"type":"integer"}},"required":["p50","p90"]},"preferred_max_latency":{"type":"object","properties":{"p50":{"type":"number"},"p90":{"type":"integer"}},"required":["p50","p90"]}},"required":["allow_fallbacks","require_parameters","data_collection","zdr","enforce_distillable_text","order","only","ignore","quantizations","sort","preferred_min_throughput","preferred_max_latency"]}},"required":["model","input","instructions","metadata","tools","tool_choice","parallel_tool_calls","text","reasoning","max_output_tokens","top_logprobs","max_tool_calls","presence_penalty","frequency_penalty","top_k","image_config","modalities","prompt_cache_key","prompt","include","background","safety_identifier","store","service_tier","truncation","stream","usage","provider"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"background":{"type":"boolean"},"completed_at":{"type":"integer"},"content_filters":{"type":"array","items":{"type":"object","properties":{"blocked":{"type":"boolean"},"content_filter_offsets":{"type":"object","properties":{"check_offset":{"type":"integer"},"end_offset":{"type":"integer"},"start_offset":{"type":"integer"}},"required":["check_offset","end_offset","start_offset"]},"content_filter_raw":{"type":"array","items":{"type":"string"}},"content_filter_results":{"type":"object","properties":{"hate":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"protected_material_code":{"type":"object","properties":{"detected":{"type":"boolean"},"filtered":{"type":"boolean"}},"required":["detected","filtered"]},"protected_material_text":{"type":"object","properties":{"detected":{"type":"boolean"},"filtered":{"type":"boolean"}},"required":["detected","filtered"]},"self_harm":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"sexual":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"violence":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]}},"required":["hate","protected_material_code","protected_material_text","self_harm","sexual","violence"]},"source_type":{"type":"string"}}}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"integer"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"number"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"integer"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"created_at":{"type":"integer"},"error":{"type":"null"},"frequency_penalty":{"type":"integer"},"id":{"type":"string"},"incomplete_details":{"type":"null"},"instructions":{"type":"string"},"max_output_tokens":{"type":"integer"},"max_tool_calls":{"type":"null"},"metadata":{"type":"object","properties":{"application":{"type":"string"},"compliance_mode":{"type":"string"},"customer_tier":{"type":"string"},"department":{"type":"string"},"environment":{"type":"string"},"feature_flag":{"type":"string"},"locale":{"type":"string"},"owner":{"type":"string"},"priority":{"type":"string"},"region":{"type":"string"},"release_channel":{"type":"string"},"request_purpose":{"type":"string"},"scenario":{"type":"string"},"tenant":{"type":"string"},"trace_label":{"type":"string"},"workflow":{"type":"string"}},"required":["application","compliance_mode","customer_tier","department","environment","feature_flag","locale","owner","priority","region","release_channel","request_purpose","scenario","tenant","trace_label","workflow"]},"model":{"type":"string"},"object":{"type":"string"},"output":{"type":"array","items":{"type":"object","properties":{"id":{"type":"string"},"summary":{"type":"array","items":{"type":"string"}},"type":{"type":"string"},"arguments":{"type":"string"},"call_id":{"type":"string"},"name":{"type":"string"},"status":{"type":"string"}},"required":["id","type"]}},"parallel_tool_calls":{"type":"boolean"},"presence_penalty":{"type":"integer"},"previous_response_id":{"type":"null"},"prompt_cache_key":{"type":"null"},"prompt_cache_retention":{"type":"null"},"reasoning":{"type":"object","properties":{"effort":{"type":"string"},"summary":{"type":"null"}},"required":["effort","summary"]},"safety_identifier":{"type":"null"},"service_tier":{"type":"string"},"status":{"type":"string"},"store":{"type":"boolean"},"temperature":{"type":"integer"},"text":{"type":"object","properties":{"format":{"type":"object","properties":{"description":{"type":"null"},"name":{"type":"string"},"schema":{"type":"object","properties":{"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"actions":{"type":"object","properties":{"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"type":{"type":"string"}},"required":["items","type"]},"confidence":{"type":"object","properties":{"maximum":{"type":"integer"},"minimum":{"type":"integer"},"type":{"type":"string"}},"required":["maximum","minimum","type"]},"key_findings":{"type":"object","properties":{"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"type":{"type":"string"}},"required":["items","type"]},"risks":{"type":"object","properties":{"items":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"type":{"type":"string"}},"required":["items","type"]},"summary":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["actions","confidence","key_findings","risks","summary"]},"required":{"type":"array","items":{"type":"string"}},"type":{"type":"string"}},"required":["additionalProperties","properties","required","type"]},"strict":{"type":"boolean"},"type":{"type":"string"}},"required":["description","name","schema","strict","type"]},"verbosity":{"type":"string"}},"required":["format","verbosity"]},"tool_choice":{"type":"object","properties":{"name":{"type":"string"},"type":{"type":"string"}},"required":["name","type"]},"tools":{"type":"array","items":{"type":"object","properties":{"description":{"type":"string"},"name":{"type":"string"},"parameters":{"type":"object","properties":{"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"description":{"type":"string"},"type":{"type":"string"}},"required":["description","type"]},"unit":{"type":"object","properties":{"description":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}},"type":{"type":"string"}},"required":["description","enum","type"]}},"required":["location","unit"]},"required":{"type":"array","items":{"type":"string"}},"type":{"type":"string"}},"required":["additionalProperties","properties","required","type"]},"strict":{"type":"boolean"},"type":{"type":"string"},"filters":{"type":"null"},"search_context_size":{"type":"string"},"user_location":{"type":"object","properties":{"city":{"type":"null"},"country":{"type":"string"},"region":{"type":"null"},"timezone":{"type":"null"},"type":{"type":"string"}},"required":["city","country","region","timezone","type"]}},"required":["type"]}},"top_logprobs":{"type":"integer"},"top_p":{"type":"number"},"truncation":{"type":"string"},"usage":{"type":"object","properties":{"input_tokens":{"type":"integer"},"input_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"}},"required":["cached_tokens"]},"output_tokens":{"type":"integer"},"output_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"}},"required":["reasoning_tokens"]},"total_tokens":{"type":"integer"}},"required":["input_tokens","input_tokens_details","output_tokens","output_tokens_details","total_tokens"]},"user":{"type":"null"}},"required":["background","completed_at","content_filters","cost","cost_details","created_at","error","frequency_penalty","id","incomplete_details","instructions","max_output_tokens","max_tool_calls","metadata","model","object","output","parallel_tool_calls","presence_penalty","previous_response_id","prompt_cache_key","prompt_cache_retention","reasoning","safety_identifier","service_tier","status","store","temperature","text","tool_choice","tools","top_logprobs","top_p","truncation","usage","user"]}}},"headers":{}}}}}}}
```


# Chat with Tool Calling

## POST /v1/responses

> Chat with Tool Calling

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Chat with Tool Calling"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/responses":{"post":{"summary":"Chat with Tool Calling","deprecated":false,"description":"","tags":["Chat with Tool Calling"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"input":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"role":{"type":"string"},"content":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"text":{"type":"string"}}}},"call_id":{"type":"string"},"name":{"type":"string"},"arguments":{"type":"string"},"output":{"type":"string"}},"required":["type","call_id"]}},"tools":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"name":{"type":"string"},"description":{"type":"string"},"strict":{"type":"boolean"},"parameters":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"unit":{"type":"object","properties":{"type":{"type":"string"},"enum":{"type":"array","items":{"type":"string"}}},"required":["type","enum"]}},"required":["location","unit"]},"required":{"type":"array","items":{"type":"string"}},"additionalProperties":{"type":"boolean"}},"required":["type","properties","required","additionalProperties"]}}}},"tool_choice":{"type":"object","properties":{"type":{"type":"string"},"name":{"type":"string"}},"required":["type","name"]},"parallel_tool_calls":{"type":"boolean"},"usage":{"type":"object","properties":{"include":{"type":"boolean"}},"required":["include"]}},"required":["model","input","tools","tool_choice","parallel_tool_calls","usage"]}}},"required":true},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"background":{"type":"boolean"},"completed_at":{"type":"integer"},"content_filters":{"type":"array","items":{"type":"object","properties":{"blocked":{"type":"boolean"},"content_filter_offsets":{"type":"object","properties":{"check_offset":{"type":"integer"},"end_offset":{"type":"integer"},"start_offset":{"type":"integer"}},"required":["check_offset","end_offset","start_offset"]},"content_filter_raw":{"type":"array","items":{"type":"string"}},"content_filter_results":{"type":"object","properties":{"hate":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"jailbreak":{"type":"object","properties":{"detected":{"type":"boolean"},"filtered":{"type":"boolean"}},"required":["detected","filtered"]},"self_harm":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"sexual":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"violence":{"type":"object","properties":{"filtered":{"type":"boolean"},"severity":{"type":"string"}},"required":["filtered","severity"]},"protected_material_code":{"type":"object","properties":{"detected":{"type":"boolean"},"filtered":{"type":"boolean"}},"required":["detected","filtered"]},"protected_material_text":{"type":"object","properties":{"detected":{"type":"boolean"},"filtered":{"type":"boolean"}},"required":["detected","filtered"]}},"required":["hate","protected_material_code","protected_material_text","self_harm","sexual","violence"]},"source_type":{"type":"string"}},"required":["blocked","content_filter_offsets","content_filter_raw","content_filter_results","source_type"]}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"integer"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"integer"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"integer"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"created_at":{"type":"integer"},"error":{"type":"null"},"frequency_penalty":{"type":"integer"},"id":{"type":"string"},"incomplete_details":{"type":"null"},"instructions":{"type":"null"},"max_output_tokens":{"type":"null"},"max_tool_calls":{"type":"null"},"metadata":{"type":"object","properties":{}},"model":{"type":"string"},"object":{"type":"string"},"output":{"type":"array","items":{"type":"object","properties":{"arguments":{"type":"string"},"call_id":{"type":"string"},"id":{"type":"string"},"name":{"type":"string"},"status":{"type":"string"},"type":{"type":"string"}}}},"parallel_tool_calls":{"type":"boolean"},"presence_penalty":{"type":"integer"},"previous_response_id":{"type":"null"},"prompt_cache_key":{"type":"null"},"prompt_cache_retention":{"type":"null"},"reasoning":{"type":"object","properties":{"effort":{"type":"string"},"summary":{"type":"null"}},"required":["effort","summary"]},"safety_identifier":{"type":"null"},"service_tier":{"type":"string"},"status":{"type":"string"},"store":{"type":"boolean"},"temperature":{"type":"integer"},"text":{"type":"object","properties":{"format":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"verbosity":{"type":"string"}},"required":["format","verbosity"]},"tool_choice":{"type":"object","properties":{"name":{"type":"string"},"type":{"type":"string"}},"required":["name","type"]},"tools":{"type":"array","items":{"type":"object","properties":{"description":{"type":"string"},"name":{"type":"string"},"parameters":{"type":"object","properties":{"additionalProperties":{"type":"boolean"},"properties":{"type":"object","properties":{"location":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]},"unit":{"type":"object","properties":{"enum":{"type":"array","items":{"type":"string"}},"type":{"type":"string"}},"required":["enum","type"]}},"required":["location","unit"]},"required":{"type":"array","items":{"type":"string"}},"type":{"type":"string"}},"required":["additionalProperties","properties","required","type"]},"strict":{"type":"boolean"},"type":{"type":"string"}}}},"top_logprobs":{"type":"integer"},"top_p":{"type":"number"},"truncation":{"type":"string"},"usage":{"type":"object","properties":{"input_tokens":{"type":"integer"},"input_tokens_details":{"type":"object","properties":{"cached_tokens":{"type":"integer"}},"required":["cached_tokens"]},"output_tokens":{"type":"integer"},"output_tokens_details":{"type":"object","properties":{"reasoning_tokens":{"type":"integer"}},"required":["reasoning_tokens"]},"total_tokens":{"type":"integer"}},"required":["input_tokens","input_tokens_details","output_tokens","output_tokens_details","total_tokens"]},"user":{"type":"null"}},"required":["background","completed_at","content_filters","cost","cost_details","created_at","error","frequency_penalty","id","incomplete_details","instructions","max_output_tokens","max_tool_calls","metadata","model","object","output","parallel_tool_calls","presence_penalty","previous_response_id","prompt_cache_key","prompt_cache_retention","reasoning","safety_identifier","service_tier","status","store","temperature","text","tool_choice","tools","top_logprobs","top_p","truncation","usage","user"]}}},"headers":{}}}}}}}
```


# Overview

Generate vector embeddings from text

Embeddings are numerical representations of text that capture its semantic meaning. They transform text into vectors (arrays of numbers) that can be used in a wide range of machine learning tasks. Infron AI offers a unified API that allows you to access embedding models from multiple providers through a single interface.

### Common Use Cases <a href="#common-use-cases" id="common-use-cases"></a>

Embeddings are used in a wide variety of applications:

* **RAG (Retrieval-Augmented Generation)**: Build RAG systems that retrieve relevant context from a knowledge base before generating answers. Embeddings help find the most relevant documents to include in the LLM's context.
* **Semantic Search**: Convert documents and queries into embeddings, then find the most relevant documents by comparing vector similarity. This provides more accurate results than traditional keyword matching because it understands meaning rather than just matching words.
* **Recommendation Systems**: Generate embeddings for items (products, articles, movies) and user preferences to recommend similar items. By comparing embedding vectors, you can find items that are semantically related even if they don't share obvious keywords.
* **Clustering and Classification**: Group similar documents together or classify text into categories by analyzing embedding patterns. Documents with similar embeddings likely belong to the same topic or category.
* **Duplicate Detection**: Identify duplicate or near-duplicate content by comparing embedding similarity. This works even when text is paraphrased or reworded.
* **Anomaly Detection**: Detect unusual or outlier content by identifying embeddings that are far from typical patterns in your dataset.

### How to Use Embeddings <a href="#how-to-use-embeddings" id="how-to-use-embeddings"></a>

#### Basic Request <a href="#basic-request" id="basic-request"></a>

To generate embeddings, send a POST request to `/embeddings` with your text input and chosen model:

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://llm.onerouter.pro/v1/embeddings",
  headers={
    "Authorization": f"Bearer {{API_KEY_REF}}",
    "Content-Type": "application/json",
  },
  json={
    "model": "{{MODEL}}",
    "input": "The quick brown fox jumps over the lazy dog"
  }
)

data = response.json()
embedding = data["data"][0]["embedding"]
print(f"Embedding dimension: {len(embedding)}")
```

{% endtab %}

{% tab title="TypeScript (fetch)" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer {{API_KEY_REF}}',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    input: 'The quick brown fox jumps over the lazy dog'
  }),
});

const data = await response.json();
const embedding = data.data[0].embedding;
console.log(`Embedding dimension: ${embedding.length}`);
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY_REF" \
  -d '{
    "model": "{{MODEL}}",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
```

{% endtab %}
{% endtabs %}

#### Batch Processing <a href="#batch-processing" id="batch-processing"></a>

You can generate embeddings for multiple texts in a single request by passing an array of strings:

Python

{% tabs %}
{% tab title="Python" %}

```python
import requests

response = requests.post(
  "https://llm.onerouter.pro/v1/embeddings",
  headers={
    "Authorization": f"Bearer {{API_KEY_REF}}",
    "Content-Type": "application/json",
  },
  json={
    "model": "{{MODEL}}",
    "input": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing enables computers to understand text"
    ]
  }
)

data = response.json()
for i, item in enumerate(data["data"]):
  print(f"Embedding {i}: {len(item['embedding'])} dimensions")
```

{% endtab %}

{% tab title="TypeScript (fetch)" %}

```typescript
const response = await fetch('https://llm.onerouter.pro/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer {{API_KEY_REF}}',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: '{{MODEL}}',
    input: [
      'Machine learning is a subset of artificial intelligence',
      'Deep learning uses neural networks with multiple layers',
      'Natural language processing enables computers to understand text'
    ]
  }),
});

const data = await response.json();
data.data.forEach((item, index) => {
  console.log(`Embedding ${index}: ${item.embedding.length} dimensions`);
});
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY_REF" \
  -d '{
    "model": "{{MODEL}}",
    "input": [
      "Machine learning is a subset of artificial intelligence",
      "Deep learning uses neural networks with multiple layers",
      "Natural language processing enables computers to understand text"
    ]
  }'
```

{% endtab %}
{% endtabs %}

#### Semantic Search <a href="#semantic-search" id="semantic-search"></a>

{% tabs %}
{% tab title="Python" %}

```python
import requests
import numpy as np

API_KEY = "{{API_KEY_REF}}"

# Sample documents
documents = [
  "The cat sat on the mat",
  "Dogs are loyal companions",
  "Python is a programming language",
  "Machine learning models require training data",
  "The weather is sunny today"
]

def cosine_similarity(a, b):
  """Calculate cosine similarity between two vectors"""
  dot_product = np.dot(a, b)
  magnitude_a = np.linalg.norm(a)
  magnitude_b = np.linalg.norm(b)
  return dot_product / (magnitude_a * magnitude_b)

def semantic_search(query, documents):
  """Perform semantic search using embeddings"""
  # Generate embeddings for query and all documents
  response = requests.post(
    "https://llm.onerouter.pro/v1/embeddings",
    headers={
      "Authorization": f"Bearer {API_KEY}",
      "Content-Type": "application/json",
    },
    json={
      "model": "{{MODEL}}",
      "input": [query] + documents
    }
  )
  
  data = response.json()
  query_embedding = np.array(data["data"][0]["embedding"])
  doc_embeddings = [np.array(item["embedding"]) for item in data["data"][1:]]
  
  # Calculate similarity scores
  results = []
  for i, doc in enumerate(documents):
    similarity = cosine_similarity(query_embedding, doc_embeddings[i])
    results.append({"document": doc, "similarity": similarity})
  
  # Sort by similarity (highest first)
  results.sort(key=lambda x: x["similarity"], reverse=True)
  
  return results

# Search for documents related to pets
results = semantic_search("pets and animals", documents)
print("Search results:")
for i, result in enumerate(results):
  print(f"{i + 1}. {result['document']} (similarity: {result['similarity']:.4f})")
```

{% endtab %}
{% endtabs %}

Expected output:

```
Search results:
1. Dogs are loyal companions (similarity: 0.8234)
2. The cat sat on the mat (similarity: 0.7891)
3. The weather is sunny today (similarity: 0.3456)
4. Machine learning models require training data (similarity: 0.2987)
5. Python is a programming language (similarity: 0.2654)
```

### Best Practices

* Choose the Right Model: Different embedding models have different strengths. Smaller models (such as qwen-qwen3-embedding-0.6b or openai-text-embedding-3-small) are faster and more cost‑efficient, while larger models (such as openai-text-embedding-3-large) generally produce higher‑quality embeddings. Test multiple models to determine which one best fits your use case.
* Batch Your Requests: When processing multiple text inputs, send them in a single request instead of making separate API calls. This helps reduce latency and overall cost.
* Cache Embeddings: Embeddings are deterministic for the same input text. Store them in a database or vector store so you don’t need to regenerate them repeatedly.
* Normalize for Comparison: When comparing embeddings, use cosine similarity rather than Euclidean distance. Cosine similarity is scale‑invariant and performs better for high‑dimensional vectors.
* Consider Context Length: Each model has a maximum input size. Longer texts may need to be chunked or truncated. Review the model’s specifications before processing large documents.
* Use Meaningful Chunking: For long documents, split them into semantically meaningful units (such as paragraphs or sections) instead of relying on fixed character counts. This helps preserve context and coherence.

### Limitations

* No Streaming: Unlike chat completions, embeddings are returned as complete responses. Streaming is not supported.
* Token Limits: Each model has a maximum input length. Texts that exceed this limit will be truncated or rejected.
* Deterministic Output: The same input text will always produce identical embeddings; there is no randomness or temperature involved.
* Language Support: Some models are optimized for specific languages. Refer to the model’s documentation for details on language coverage.


# Submit an embedding request

## Submit an embedding request

> Submits an embedding request to the embeddings models

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Submit an embedding request"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/embeddings":{"post":{"summary":"Submit an embedding request","deprecated":false,"description":"Submits an embedding request to the embeddings models","tags":["Submit an embedding request"],"parameters":[{"name":"Authorization","in":"header","description":"API key as bearer token in Authorization header","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string","description":"ID of the model to use. You can use the List models API to see all of your available models, or see our Model Marketplace for descriptions of them."},"input":{"type":"array","items":{"type":"string"},"description":"string or list of strings or list of doubles or list of lists of doubles or list of objects"},"encoding_format":{"type":"string","description":"The format to return the embeddings in. Can be either `float` or `base64`."}},"required":["model","input"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"object":{"type":"string"},"data":{"type":"array","items":{"type":"object","properties":{"object":{"type":"string"},"index":{"type":"integer"},"embedding":{"type":"array","items":{"type":"number"}}}}},"model":{"type":"string"},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"}},"required":["prompt_tokens","completion_tokens","total_tokens"]}},"required":["id","object","data","model","usage"]}}},"headers":{}}}}}}}
```


# Overview

Generate ranks of relevancy between the query and documents

A reranker evaluates the relevance between a query and a set of documents, and returns their ranked relevance scores. These documents are often the initial results obtained from an embedding‑based retrieval system, and the reranker refines their ordering to provide more accurate relevance assessments.

Unlike [embedding](/docs/llm-apis/embeddings-api/overview) models that encode queries and documents independently, rerankers are [cross‑encoders](https://www.sbert.net/examples/applications/cross-encoder/README.html) that jointly process each query–document pair. This allows them to deliver more precise relevance predictions. As a result, it is common to apply a reranker to the top candidate documents retrieved through embedding‑based search or traditional lexical search methods such as [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) or [TF‑IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf).

### Get Started <a href="#get-started" id="get-started"></a>

#### Example with Texts <a href="#example-with-texts" id="example-with-texts"></a>

In the example below, we use the Rerank API endpoint to index the list of `documents` from most to least relevant to the query `"What is the capital of the United States?"`.

**Request**

In this example, the documents being passed in are a list of strings:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

url = "https://llm.onerouter.pro/v1/rerank"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer {{API_KEY}}"
}
data = {
    "model": "qwen/qwen3-reranker-0.6b",
    "query": "What is the capital of the United States?",
    "top_n": 3,
    "documents": [
        "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
        "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
        "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
        "Capital punishment has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.",
    ]        
}

response = requests.post(url, headers=headers, data=json.dumps(data))

print(response.json())
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl --request POST \
  --url https://llm.onerouter.pro/v1/rerank \
  --header 'content-type: application/json' \
  --header "Authorization: Bearer $API_KEY" \
  --data '{
    "model": "qwen/qwen3-reranker-0.6b",
    "query": "What is the capital of the United States?",
    "documents": [
      "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
      "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
      "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
      "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
      "Capital punishment has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."
    ],
    "top_n": 5
  }'
```

{% endtab %}
{% endtabs %}

**Response**

```json
{
  "results": [
    {
      "index": 0,
      "relevance_score": 0.09358743578195572
    },
    {
      "index": 1,
      "relevance_score": 0.029959920793771744
    },
    {
      "index": 2,
      "relevance_score": 0.12215154618024826
    },
    {
      "index": 3,
      "relevance_score": 0.996448278427124
    },
    {
      "index": 4,
      "relevance_score": 0.00711569981649518
    }
  ],
  "usage": {
    "prompt_tokens": 632,
    "completion_tokens": 0,
    "total_tokens": 632
  }
}
```


# Submit an ranking request

## Submit an ranking request

> Submits an ranking request to the reranker models

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Submit an ranking request"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Prod Env"}],"security":[],"paths":{"/v1/rerank":{"post":{"summary":"Submit an ranking request","deprecated":false,"description":"Submits an ranking request to the reranker models","tags":["Submit an ranking request"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"query":{"type":"string"},"documents":{"type":"array","items":{"type":"string"}},"top_n":{"type":"integer"}},"required":["model","query","documents"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"results":{"type":"array","items":{"type":"object","properties":{"index":{"type":"integer"},"relevance_score":{"type":"number"}},"required":["index","relevance_score"]}},"usage":{"type":"object","properties":{"prompt_tokens":{"type":"integer"},"completion_tokens":{"type":"integer"},"total_tokens":{"type":"integer"}},"required":["prompt_tokens","completion_tokens","total_tokens"]}},"required":["results","usage"]}}},"headers":{}}}}}}}
```


# Overview

Infron is a unified AI platform that provides access to 500+ state-of-the-art models for image generation, video creation, audio synthesis, and more.

### What You Can Create

#### Image Generation

| Category           | Description                                |
| ------------------ | ------------------------------------------ |
| **Text to Image**  | Generate images from text prompts          |
| **Image to Image** | Edit, transform, or style transfer images  |
| **Upscaler**       | Enhance resolution and image quality       |
| **AI Remover**     | Remove objects, backgrounds, or watermarks |

#### Video Generation

| Category           | Description                          |
| ------------------ | ------------------------------------ |
| **Image to Video** | Animate still images into video      |
| **Text to Video**  | Create videos from text descriptions |
| **Video Effects**  | Apply visual effects and filters     |
| **Video to Video** | Edit, restyle, or enhance videos     |
| **Video Extend**   | Extend video duration seamlessly     |
| **Motion Control** | Control movement and animation       |

#### Audio & Speech

| Category           | Description                      |
| ------------------ | -------------------------------- |
| **Text to Speech** | Generate natural voice from text |
| **Text to Audio**  | Create music and sound effects   |
| **Speech to Text** | Transcribe audio to text         |
| **Audio Editing**  | Edit and enhance audio files     |
| **Video to Audio** | Generate audio/music for videos  |

#### Digital Human & Portrait

| Category              | Description                    |
| --------------------- | ------------------------------ |
| **Digital Human**     | Create talking avatar videos   |
| **Portrait Transfer** | Face swap and portrait editing |

#### 3D Generation

| Category        | Description                    |
| --------------- | ------------------------------ |
| **Image to 3D** | Generate 3D models from images |
| **Text to 3D**  | Create 3D models from text     |

#### AI Vision & Analysis

| Category               | Description                        |
| ---------------------- | ---------------------------------- |
| **Image to Text**      | Image captioning and analysis      |
| **Content Moderation** | Filter inappropriate content       |
| **Video to Text**      | Video understanding and captioning |

### Request Flow

#### 1. Submit Task

Send your request with parameters to the model endpoint:

{% tabs %}
{% tab title="cURL (Video Generation)" %}

```bash
curl https://llm.onerouter.pro/v1/videos/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "{model-id}",
  "prompt": "Your prompt here"
  ...parameters
}'
```

{% endtab %}

{% tab title="cURL (Image Generation)" %}

```bash
curl https://llm.onerouter.pro/v1/images/generations \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "{model-id}",
  "prompt": "Your prompt here"
  ...parameters
}'
```

{% endtab %}
{% endtabs %}

#### 2. Task Processing

* Validates your request
* Routes to the appropriate model
* Routes to the fallback providers, if the previous provider experienced issues such as insufficient capacity and availability failures.&#x20;
* Queues and processes the task
* Generates content

#### 3. Retrieve Results

Poll the task status until completion:

{% tabs %}
{% tab title="cURL (Video Generation)" %}

```bash
curl https://video.onerouter.pro/v1/videos/tasks/{task_id} \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" 
```

{% endtab %}

{% tab title="cURL (Image Generation)" %}

```bash
curl https://video.onerouter.pro/v1/images/tasks/{task_id} \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" 
```

{% endtab %}
{% endtabs %}

When status is `completed`, the `outputs` array contains URLs to your generated content.

{% tabs %}
{% tab title="JSON (Video Generation)" %}

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "{task_id}",
    "object": "video",
    "model": "{model-id}",
    "status": "completed",
    "fail_reason": "",
    "submit_time": 1779064285,
    "start_time": 1779064284,
    "finish_time": 1779064332,
    "outputs": [
      "https://storage.googleapis.com/infron_gcs/video%2Fvideo_v2%2Fveo_output%2Fe79e79099e6544138c8d2db68bcca266%2F16093903391301042331%2Fsample_0.mp4"
    ],
    "created_at": "2026-05-18 00:31:25"
  }
}
```

{% endtab %}

{% tab title="JSON (Image Generation)" %}

```json
{
  "code": 200,
  "message": "success",
  "data": {
    "task_id": "fe4d1df7d7c347958cefe0b5290f3527",
    "object": "image",
    "model": "openai/gpt-image-2/image-to-image",
    "status": "completed",
    "fail_reason": "",
    "submit_time": 1779368267,
    "start_time": 1779368266,
    "finish_time": 1779368327,
    "outputs": [
      "https://storage.googleapis.com/infron_gcs/image%2Fgenerated%2Fofficial_openai_image_4b081ff0fb794a8391f708964353a071.png"
    ],
    "created_at": "2026-05-21 12:57:47"
  }
}
```

{% endtab %}
{% endtabs %}

### Content Lifecycle

1. **Task created** — Your request is received
2. **Processing** — AI model generates content
3. **Completed** — Content available via URLs
4. **Retained for 7 days** — Download before expiration
5. **Deleted** — Content is permanently removed

{% hint style="info" %}
created -> in\_progress → processing → completed | failed
{% endhint %}

### Rate Limits

Your account tier determines:

* How many requests per minute
* How many concurrent tasks

See [Account Levels & Rate Limits](https://infron.ai/dashboard/quota-limit) for details.


# Upload Files

Upload images, videos, and audio files for use with Infron models.

### Supported Formats

<table><thead><tr><th width="184.09326171875">Type</th><th>Formats</th></tr></thead><tbody><tr><td><strong>Images</strong></td><td><code>JPG</code>, <code>JPEG</code>, <code>PNG</code>, <code>WebP</code>, <code>GIF</code>, <code>BMP</code>, <code>TIFF</code></td></tr><tr><td><strong>Videos</strong></td><td><code>MP4</code>, <code>AVI</code>, <code>MOV</code>, <code>WMV</code>, <code>FLV</code>, <code>WebM</code>, <code>MKV</code>, <code>3GP</code>, <code>OGV</code></td></tr><tr><td><strong>Audio</strong></td><td><code>MP3</code>, <code>WAV</code>, <code>OGG</code>, <code>AAC</code>, <code>FLAC</code>, <code>WebM</code>, <code>M4A</code>, <code>Opus</code></td></tr></tbody></table>

> **Note:** For files larger than 50MB, we recommend using a URL input instead of uploading directly.

> **Note:** A single account is permitted to upload up to 1,000 assets, if you require a higher limit, please contact <support@inforn.ai>.

### 1. Upload resources

<table><thead><tr><th width="184.09326171875">Field</th><th>Required</th><th>Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><strong>model</strong></td><td><code>True</code></td><td>string</td><td>-</td><td>Model id</td></tr><tr><td><strong>file</strong></td><td><code>True</code></td><td>file/string</td><td>-</td><td><code>Local file stream</code> or <code>remote resource URL</code> (only http/https supported).</td></tr></tbody></table>

{% tabs %}
{% tab title="multipart/form-data" %}

```bash
curl -X POST 'https://media.onerouter.pro/v1/upload/resources' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'model=bytedance/seedance-2.0/virtual-portrait-reference-to-video' \
  -F 'file=@/tmp/person-front.png;type=image/png'
```

{% endtab %}

{% tab title="application/json" %}

```bash
curl -X POST 'https://media.onerouter.pro/v1/upload/resources' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "bytedance/seedance-2.0/virtual-portrait-reference-to-video",
    "file": "https://example.com/person-side.png"
  }'
```

{% endtab %}
{% endtabs %}

Response

```json
{
  "code": 200,
  "data": {
    "id": 2,
    "upstream_asset_uri": "asset://asset-20260606170356-s9vrz",
    "upstream_status": "Processing",
    "asset_type": "Image",
    "description": "",
    "upload_source": "remote_url",
    "source_url": "https://storage.googleapis.com/infron_gcs/image%2Fplayground-inputs%2F15%2F20260606%2Fcde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
    "gcs_url": "https://storage.googleapis.com/infron_gcs/image%2Fmodel-assets%2Fseedance%2F13%2F20260606%2Fb97f550f-7309-41ba-b7a4-585bb43d97e9.jpg",
    "file_name": "cde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
    "content_type": "image/jpeg",
    "file_ext": ".jpg",
    "file_size_bytes": 39383,
    "sha256_hash": "e7d916a106671814609f2ecb3b57ae1afd048df21a8c7c131f4035918cef2cdc",
    "status": "processing",
    "last_polled_at": "2026-06-06T09:03:55.754111989Z",
    "created_at": "2026-06-06T09:03:55.755Z",
    "updated_at": "2026-06-06T09:03:55.755Z"
  },
  "message": "success"
}
```

### 2. List resources

<table><thead><tr><th width="184.09326171875">Field</th><th>Required</th><th>Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><strong>page</strong></td><td><code>False</code></td><td>number</td><td><code>1</code></td><td>Page index</td></tr><tr><td><strong>page_size</strong></td><td><code>False</code></td><td>number</td><td><code>20</code></td><td>Number of items per page, maximum 100</td></tr></tbody></table>

{% tabs %}
{% tab title="cURL" %}

```bash
curl 'https://media.onerouter.pro/v1/list/resources?page=1&page_size=20' \
  -H 'Authorization: Bearer YOUR_API_KEY'
```

{% endtab %}
{% endtabs %}

Response

```json
{
  "code": 200,
  "data": {
    "items": [
      {
        "id": 2,
        "upstream_asset_uri": "asset://asset-20260606170356-s9vrz",
        "upstream_status": "Active",
        "asset_type": "Image",
        "description": "",
        "upload_source": "remote_url",
        "source_url": "https://storage.googleapis.com/infron_gcs/image%2Fplayground-inputs%2F15%2F20260606%2Fcde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
        "gcs_url": "https://storage.googleapis.com/infron_gcs/image%2Fmodel-assets%2Fseedance%2F13%2F20260606%2Fb97f550f-7309-41ba-b7a4-585bb43d97e9.jpg",
        "file_name": "cde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
        "content_type": "image/jpeg",
        "file_ext": ".jpg",
        "file_size_bytes": 39383,
        "sha256_hash": "e7d916a106671814609f2ecb3b57ae1afd048df21a8c7c131f4035918cef2cdc",
        "status": "active",
        "last_polled_at": "2026-06-06T09:05:01Z",
        "active_at": "2026-06-06T09:05:01Z",
        "created_at": "2026-06-06T09:03:56Z",
        "updated_at": "2026-06-06T09:05:01Z"
      },
      {
        "id": 1,
        "upstream_asset_uri": "asset://asset-20260606170301-mfcb4",
        "upstream_status": "Active",
        "asset_type": "Image",
        "description": "",
        "upload_source": "remote_url",
        "source_url": "https://storage.googleapis.com/infron_gcs/image%2Fplayground-inputs%2F15%2F20260606%2Fcde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
        "gcs_url": "https://storage.googleapis.com/infron_gcs/image%2Fmodel-assets%2Fseedance%2F13%2F20260606%2F522c8572-b290-4e82-978a-1998ea5ac5c5.jpg",
        "file_name": "cde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
        "content_type": "image/jpeg",
        "file_ext": ".jpg",
        "file_size_bytes": 39383,
        "sha256_hash": "e7d916a106671814609f2ecb3b57ae1afd048df21a8c7c131f4035918cef2cdc",
        "status": "active",
        "last_polled_at": "2026-06-06T09:05:01Z",
        "active_at": "2026-06-06T09:05:01Z",
        "created_at": "2026-06-06T09:03:01Z",
        "updated_at": "2026-06-06T09:05:01Z"
      }
    ],
    "total": 2,
    "page": 1,
    "page_size": 20
  },
  "message": "success"
}
```

### 3. Check resource status

<table><thead><tr><th width="184.09326171875">Field</th><th>Required</th><th>Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><strong>id</strong></td><td><code>True</code></td><td>number</td><td>-</td><td>Resource id</td></tr></tbody></table>

{% tabs %}
{% tab title="cURL" %}

```bash
curl 'https://media.onerouter.pro/v1/status/resources/{id}' \
  -H 'Authorization: Bearer YOUR_API_KEY'
```

{% endtab %}
{% endtabs %}

Response

{% tabs %}
{% tab title="status=Active" %}

```json
{
  "code": 200,
  "data": {
    "id": 2,
    "upstream_asset_uri": "asset://asset-20260606170356-s9vrz",
    "upstream_status": "Active",
    "asset_type": "Image",
    "description": "",
    "upload_source": "remote_url",
    "source_url": "https://storage.googleapis.com/infron_gcs/image%2Fplayground-inputs%2F15%2F20260606%2Fcde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
    "gcs_url": "https://storage.googleapis.com/infron_gcs/image%2Fmodel-assets%2Fseedance%2F13%2F20260606%2Fb97f550f-7309-41ba-b7a4-585bb43d97e9.jpg",
    "file_name": "cde00456-aebc-453b-a1e3-8cafebe0a457.jpg",
    "content_type": "image/jpeg",
    "file_ext": ".jpg",
    "file_size_bytes": 39383,
    "sha256_hash": "e7d916a106671814609f2ecb3b57ae1afd048df21a8c7c131f4035918cef2cdc",
    "status": "active",
    "last_polled_at": "2026-06-06T09:05:01Z",
    "active_at": "2026-06-06T09:05:01Z",
    "created_at": "2026-06-06T09:03:56Z",
    "updated_at": "2026-06-06T09:05:01Z"
  },
  "message": "success"
}
```

{% endtab %}

{% tab title="status=Processing" %}

```json
{
  "code": 200,
  "data": {
    "id": 4,
    "upstream_asset_uri": "asset://asset-20260606171118-qd975",
    "upstream_status": "Processing",
    "asset_type": "Image",
    "description": "",
    "upload_source": "remote_url",
    "source_url": "https://storage.googleapis.com/infron_gcs/image%2Fplayground-inputs%2F15%2F20260606%2F9f0bc756-c140-4991-9dc3-0466f867f499.jpg",
    "gcs_url": "https://storage.googleapis.com/infron_gcs/image%2Fmodel-assets%2Fseedance%2F13%2F20260606%2F7e6f112e-a74f-4201-82c3-0f9d3d35073d.jpg",
    "file_name": "9f0bc756-c140-4991-9dc3-0466f867f499.jpg",
    "content_type": "image/jpeg",
    "file_ext": ".jpg",
    "file_size_bytes": 84981,
    "sha256_hash": "2f02fe67afad4099ac942962c3a0857ac9da58d332c90f459513cf6e1c13de05",
    "status": "processing",
    "last_polled_at": "2026-06-06T09:11:17Z",
    "created_at": "2026-06-06T09:11:17Z",
    "updated_at": "2026-06-06T09:11:18Z"
  },
  "message": "success"
}
```

{% endtab %}

{% tab title="status=Failed" %}

```json
{
  "code": 200,
  "data": {
    "id": 4,
    "upstream_asset_uri": "asset://asset-20260606171118-qd975",
    "upstream_status": "Failed",
    "asset_type": "Image",
    "description": "",
    "upload_source": "remote_url",
    "source_url": "https://storage.googleapis.com/infron_gcs/image%2Fplayground-inputs%2F15%2F20260606%2F9f0bc756-c140-4991-9dc3-0466f867f499.jpg",
    "gcs_url": "https://storage.googleapis.com/infron_gcs/image%2Fmodel-assets%2Fseedance%2F13%2F20260606%2F7e6f112e-a74f-4201-82c3-0f9d3d35073d.jpg",
    "file_name": "9f0bc756-c140-4991-9dc3-0466f867f499.jpg",
    "content_type": "image/jpeg",
    "file_ext": ".jpg",
    "file_size_bytes": 84981,
    "sha256_hash": "2f02fe67afad4099ac942962c3a0857ac9da58d332c90f459513cf6e1c13de05",
    "status": "failed",
    "failure_code": "upstream_failed",
    "last_polled_at": "2026-06-06T09:13:01Z",
    "created_at": "2026-06-06T09:11:17Z",
    "updated_at": "2026-06-06T09:13:01Z"
  },
  "message": "success"
}
```

{% endtab %}
{% endtabs %}

{% hint style="info" %}
Real-life portrait footage needs to be authorized, uploaded, and undergo a consistency check before it can be made usable.
{% endhint %}

### 4. Delete resources

<table><thead><tr><th width="184.09326171875">Field</th><th>Required</th><th>Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><strong>id</strong></td><td><code>True</code></td><td>number</td><td>-</td><td>Resource id</td></tr></tbody></table>

{% tabs %}
{% tab title="cURL" %}

```bash
curl -X DELETE 'https://media.onerouter.pro/v1/delete/resources/{id}' \
  -H 'Authorization: Bearer YOUR_API_KEY'
```

{% endtab %}
{% endtabs %}

Response

```json
{
  "code": 200,
  "data": {
    "deleted": true
  },
  "message": "success"
}
```

### 5. Referencing assets in model calls

The `upstream_asset_uri` returned from the upload or query operation can be used directly for video generation:

```bash
curl -X POST 'https://media.onerouter.pro/v1/videos/generations' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "{MODEL}",
    "prompt": "Generate a 4-second video using @Image1.",
    "image_urls": [
      "{upstream_asset_uri}"
    ]
  }'
```

### Retention

Uploaded files are stored for **7 days** and then automatically deleted.


# How to Write Better Prompts

Improve your AI generation results with effective prompt writing techniques.

### Prompt Structure

For best results, prompts should be descriptive and specific.

A strong prompt usually includes:

<table data-header-hidden><thead><tr><th width="180.20404052734375">Element</th><th>Description</th><th>Example</th></tr></thead><tbody><tr><td><strong>Subject</strong></td><td>What appears in the video</td><td><code>a young woman</code>, <code>a robot</code>, <code>a city skyline</code></td></tr><tr><td><strong>Context</strong></td><td>The setting or background</td><td><code>on a rainy Tokyo street at night</code></td></tr><tr><td><strong>Action</strong></td><td>What is happening</td><td><code>walking slowly while holding an umbrella</code></td></tr><tr><td><strong>Style</strong></td><td>Visual or cinematic style</td><td><code>cinematic</code>, <code>documentary</code>, <code>noir</code>, <code>cartoon</code></td></tr><tr><td><strong>Camera Motion</strong></td><td>Optional camera movement</td><td><code>tracking shot</code>, <code>aerial view</code>, <code>slow dolly-in</code></td></tr><tr><td><strong>Composition</strong></td><td>Framing or shot type</td><td><code>wide shot</code>, <code>close-up</code>, <code>over-the-shoulder</code></td></tr><tr><td><strong>Ambiance</strong></td><td>Lighting, color, sound, mood</td><td><code>soft golden light</code>, <code>ambient city noise</code></td></tr></tbody></table>

Example prompt:

```
A cinematic tracking shot of a young woman walking through a rainy Tokyo street at night.
Neon signs reflect on the wet pavement. The camera slowly follows her from behind.
The mood is quiet and futuristic, with soft ambient city sounds and distant traffic.
```

### Example Comparison

**Basic prompt:**

```
a cat
```

**Better prompt:**

```
a fluffy orange cat sitting on a windowsill, soft afternoon sunlight streaming through the window, bokeh background, cozy atmosphere, photography style, 4K, high detail
```

The detailed prompt gives the model more context, resulting in higher quality and more predictable output.

### Tips by Model Type

**For image generation (T2I):**

* Be specific about composition and framing
* Include artistic style references
* Mention lighting conditions

**For video generation (T2V/I2V):**

* Describe the motion or action clearly
* Keep prompts focused on one main action
* Specify camera movement if needed (zoom, pan, static)

### Quick Tips

| Goal             | Action                                     |
| ---------------- | ------------------------------------------ |
| More control     | Be specific and descriptive                |
| Consistent style | Include style references                   |
| Higher quality   | Add quality keywords (4K, detailed, sharp) |


# How to Reduce Costs

Optimize your Infron usage to get the best results while managing costs effectively.

### Test Before Final Generation

**For videos:**

1. Start with **480p** resolution to test your prompt
2. Review the result and adjust if needed
3. Once satisfied, generate the final version in **720p** or **1080p**

This approach can save 50-70% on testing costs.

**For images:**

1. Set **batch size to 1** for initial testing
2. Refine your prompt until you’re happy with the result
3. Then generate multiple images in a batch

### Avoid Parameter Mismatch

Some models have parameters that must be consistent:

**Example: Seedream Sequential**

If your prompt says “generate 4 images”, make sure `max_images` is set to 4.

```
Prompt: "Create 4 variations of a logo design..."max_images: 4  ✓ Correctmax_images: 2  ✗ Mismatch - may cause unexpected charges
```

### Start with Ultra Fast Models

Use the **Ultra Fast** series for testing and iteration:

* Lower cost per generation
* Faster results for quick validation
* Great for prompt refinement

### Volume Discounts

For high-volume users, contact <support@infron.ai> for custom pricing options.<br>


# Seedance 2.0 Real Human

Generate videos featuring real people in Seedance 2.0 with identity consistency, native audio-video sync, and a one-time ByteDance verification in Infron.

We're thrilled to announce video production with actual individuals is now possible in **Seedance 2.0**. Provide a portrait photo, finish a single verification step, and produce clips maintaining uniform identity and natural facial movements.

### Key Changes

Seedance 2.0 stands as ByteDance's newest video creation framework, engineered for professional-grade precision and inherent audiovisual alignment. Through **Real Human** integration in Infron, virtual avatars no longer restrict your options—apply these capabilities to footage of actual persons, preserving consistent appearance and delivery across complete sequences.

* **Live actors with filmic control**: Adjust camera motions, illumination, and scene transitions while sustaining subject authenticity.
* **Stable persona retention**: Preserves facial traits and general look consistently during movement, camera operations, and scene shifts.
* **Built-in audiovisual synchronization**: Creates video and sound (speech/ambience/music) simultaneously with enhanced lip coordination and expressions.
* **Multi-source direction**: Blend text with up to **9 visuals**, **3 clips**, and **3 soundtracks** to fix appearance, motion, and cadence.

### Verification Mechanism in Infron

The authentication system aims to prevent identity theft and unapproved usage of personal likeness. ByteDance manages this procedure, following evolving AI transparency guidelines.

<details>

<summary><strong>Initial Setup (Verification Needed)</strong></summary>

Refer to the asset library [documentation](/docs/media-apis/advanced-features/upload-files) and upload the images requiring facial verification to the Infron asset library.

</details>

<details>

<summary><strong>Compliance Audit</strong></summary>

Infron will utilize Byteplus's KYC and Anti-fraud engines to screen images for risk.\
Infron will remove images identified during screening as high-risk, prohibited, or infringing on intellectual property rights, and issue risk warnings to the associated accounts.

</details>

### Begin Here

As per usual, happy creating🍷!


# Overview

Infron Web Search Model & Agent Integration Overview

### What Is a Web Search Model & Agent?

A **Web Search Model** is an advanced AI system that interprets user queries and retrieves the most relevant, accurate, and context-aware information from the internet. Unlike traditional keyword search engines, modern systems use large language models (LLMs) and retrieval‑augmented generation (RAG) to understand intent, reason over data, and deliver synthesized insights rather than simple link lists.

A **Search Agent** extends these capabilities. It is an autonomous or semi-autonomous AI agent that can:

* Understand natural language questions in context
* Dynamically query search APIs, websites, or knowledge bases
* Aggregate, filter, and synthesize information
* Present actionable, concise answers

Together, web search models and agents form the foundation of intelligent information retrieval — enabling real-time knowledge discovery, research automation, and informed decision-making.

### Infron: Unified Access to Leading AI Search Models

Infron provides a unified, standardized API that integrates multiple industry-leading AI search models and agents into one seamless interface. This allows developers and enterprises to access diverse search and reasoning capabilities without dealing with fragmented or model-specific integrations.

### Why Choose Infron AI

* Unified Integration: Access multiple AI search providers through a consistent API.&#x20;
* Dynamic Routing: Automatically route queries to the best-performing model for each task.&#x20;
* Scalability & Reliability: Built-in monitoring, caching, and fallback ensure stable performance.&#x20;
* Continual Expansion: New search agents and APIs are regularly integrated as the ecosystem evolves.


# Tavily

## tavily/tavily-search

> Execute a search query using Tavily Search.\ <br>

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Tavily"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/tavily/search":{"post":{"summary":"tavily/tavily-search","deprecated":false,"description":"Execute a search query using Tavily Search.\n\n","tags":["Tavily"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"query":{"type":"string","description":"The search query to execute with Tavily."},"auto_parameters":{"type":"boolean","description":"When `auto_parameters` is enabled, Tavily automatically configures search parameters based on your query's content and intent. You can still set other parameters manually, and your explicit values will override the automatic ones. The parameters `include_answer`, `include_raw_content`, and `max_results` must always be set manually, as they directly affect response size. Note: `search_depth` may be automatically set to advanced when it's likely to improve results. This uses 2 API credits per request. To avoid the extra cost, you can explicitly set `search_depth` to `basic`. Currently in beta.","default":false},"topic":{"type":"string","description":"The category of the search. `news` is useful for retrieving real-time updates, particularly about politics, sports, and major current events covered by mainstream media sources. `general` is for broader, more general-purpose searches that may include a wide range of sources.","enum":["general","news","finance "],"default":"general"},"search_depth":{"type":"string","description":"The depth of the search. `advanced` search is tailored to retrieve the most relevant sources and content snippets for your query, while `basic` search provides generic content snippets from each source. A `advanced` search costs twice price than `basic` search.","enum":["basic","advanced "],"default":"basic"},"chunks_per_source":{"type":"integer","description":"Chunks are short content snippets (maximum 500 characters each) pulled directly from the source. Use `chunks_per_source` to define the maximum number of relevant chunks returned per source and to control the content length. Chunks will appear in the content field as: `<chunk 1> [...] <chunk 2> [...] <chunk 3>`. Available only when `search_depth` is `advanced`.","enum":[1,2,3],"default":3},"max_results":{"type":"integer","description":"The maximum number of search results to return.","default":5,"minimum":0,"maximum":20},"time_range":{"type":"string","description":"The time range back from the current date to filter results based on publish date or last updated date. Useful when looking for sources that have published or updated data. If the `time_range` parameter is provided, you can't also include `start_date` and `end_date` parameters","enum":["day","week","month","year","d","w","m","y "]},"start_date":{"type":"string","description":"Will return all results after the specified start date based on publish date or last updated date. Required to be written in the format YYYY-MM-DD"},"end_date":{"type":"string","description":"Will return all results before the specified end date based on publish date or last updated date. Required to be written in the format YYYY-MM-DD"},"include_answer":{"type":"boolean","description":"Include an LLM-generated answer to the provided query. `true` returns a quick answer. `false` returns a more detailed answer.","default":false},"include_raw_content":{"type":"boolean","default":false,"description":"Include the cleaned and parsed HTML content of each search result. `true` returns search result content in markdown format. `false` returns the plain text from the results and may increase latency."},"include_images":{"type":"boolean","description":"Also perform an image search and include the results in the response.\n\n","default":false},"include_image_descriptions":{"type":"boolean","description":"When `include_images` is `true`, also add a descriptive text for each image.","default":false},"include_favicon":{"type":"boolean","description":"Whether to include the favicon URL for each result.\n\n","default":false},"include_domains":{"type":"array","items":{"type":"string"},"description":"A list of domains to specifically include in the search results. Maximum 300 domains."},"exclude_domains":{"type":"array","items":{"type":"string"},"description":"A list of domains to specifically exclude from the search results. Maximum 150 domains."},"country":{"type":"string","description":"Boost search results from a specific country. This will prioritize content from the selected country in the search results. Available only if topic is `general`.","enum":["afghanistan","albania","algeria","andorra","angola","argentina","armenia","australia","austria","azerbaijan","bahamas","bahrain","bangladesh","barbados","belarus","belgium","belize","benin","bhutan","bolivia","bosnia and herzegovina","botswana","brazil","brunei","bulgaria","burkina faso","burundi","cambodia","cameroon","canada","cape verde","central african republic","chad","chile","china","colombia","comoros","congo","costa rica","croatia","cuba","cyprus","czech republic","denmark","djibouti","dominican republic","ecuador","egypt","el salvador","equatorial guinea","eritrea","estonia","ethiopia","fiji","finland","france","gabon","gambia","georgia","germany","ghana","greece","guatemala","guinea","haiti","honduras","hungary","iceland","india","indonesia","iran","iraq","ireland","israel","italy","jamaica","japan","jordan","kazakhstan","kenya","kuwait","kyrgyzstan","latvia","lebanon","lesotho","liberia","libya","liechtenstein","lithuania","luxembourg","madagascar","malawi","malaysia","maldives","mali","malta","mauritania","mauritius","mexico","moldova","monaco","mongolia","montenegro","morocco","mozambique","myanmar","namibia","nepal","netherlands","new zealand","nicaragua","niger","nigeria","north korea","north macedonia","norway","oman","pakistan","panama","papua new guinea","paraguay","peru","philippines","poland","portugal","qatar","romania","russia","rwanda","saudi arabia","senegal","serbia","singapore","slovakia","slovenia","somalia","south africa","south korea","south sudan","spain","sri lanka","sudan","sweden","switzerland","syria","taiwan","tajikistan","tanzania","thailand","togo","trinidad and tobago","tunisia","turkey","turkmenistan","uganda","ukraine","united arab emirates","united kingdom","united states","uruguay","uzbekistan","venezuela","vietnam","yemen","zambia","zimbabwe "]}},"required":["query"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"query":{"type":"string"},"follow_up_questions":{"type":"null"},"answer":{"type":"null"},"images":{"type":"array","items":{"type":"object","properties":{"url":{"type":"string"},"description":{"type":["string","null"]}},"required":["url","description"]}},"results":{"type":"array","items":{"type":"string"}},"response_time":{"type":"number"},"request_id":{"type":"string"}},"required":["query","follow_up_questions","answer","images","results","response_time","request_id"]}}},"headers":{}},"400":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"error":{"type":"object","properties":{"message":{"type":"string"},"type":{"type":"string"},"param":{"type":"string"},"code":{"type":"integer"}},"required":["message","type","param","code"]}},"required":["error"]}}},"headers":{}}}}}}}
```


# Jina

## POST /v1/chat/completions

> jina/jina-deepsearch-v1

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Jina"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"jina/jina-deepsearch-v1","deprecated":false,"description":"","tags":["Jina"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}},"required":["role","content"]}},"stream":{"type":"boolean","description":"Delivers events as they occur through server-sent events, including reasoning steps and final answers. We strongly recommend keeping this option enabled since DeepSearch requests can take significant time to complete. Disabling streaming may result in '524 timeout' errors."},"reasoning_effort":{"type":"string","description":"Constrains effort on reasoning for reasoning models. Currently supported values are low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response."},"budget_tokens":{"type":"integer","description":"This determines the maximum number of tokens is allowed use for DeepSearch process. Larger budgets can improve response quality by enabling more exhausive search for complex queries, although DeepSearch may not use the entire budget allocated. This overrides the reasoning_effort parameter."},"max_returned_urls":{"type":"string","description":"The maximum number of URLs to include in the final answer/chunk. URLs are sorted by relevance and other important factors."},"max_attempts":{"type":"integer","description":"The maximum number of retries for solving a problem in DeepSearch process. A larger value allows DeepSearch to retry solving the problem by using different reasoning approaches and strategies. This parameter overrides the reasoning_effort parameter."},"team_size":{"type":"integer","description":"The number of agents that will work on the problem in parallel. All agents will share one token budget but independent max_attempts, and they will collaborate to produce a final answer."},"search_provider":{"type":"string","description":"Optimized search engine for research arXiv papers. This will restrict all search to arXiv only."},"response_format":{"type":"object","properties":{"type":{"type":"string"},"json_schema":{"type":"object","properties":{"type":{"type":"string"},"properties":{"type":"object","properties":{"numerical_answer_only":{"type":"object","properties":{"type":{"type":"string"}},"required":["type"]}},"required":["numerical_answer_only"]}},"required":["type","properties"]}},"description":"This enables Structured Outputs which ensures the final answer from the model will match your supplied JSON schema."},"search_language_code":{"type":"string","description":"Force the language to use for the search query. Useful when resources are more likely to be in a specific language. By default it is automatically determined by the system."},"language_code":{"type":"string","description":"Force the language of the answer and think with the given language code. By default it is automatically determined from the primary language of the input messages. The quality of the answer may be subtly affected by the language."},"boost_hostnames":{"type":"array","items":{"type":"string"},"description":"A list of domains that are given a higher priority for content retrieval. Useful for domain-specific, high-quality sources that provide valuable content."},"bad_hostnames":{"type":"array","items":{"type":"string"},"description":"A list of domains to be strictly excluded from content retrieval. Typically used to filter out known spam, low-quality, or irrelevant websites."},"only_hostnames":{"type":"array","items":{"type":"string"},"description":"A list of domains to be exclusively included in content retrieval. All other domains will be ignored. Useful for domain-specific searches."},"no_direct_answer":{"type":"boolean"}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"object","properties":{"finish_reason":{"type":"string"},"index":{"type":"integer"},"logprobs":{"type":"null"},"message":{"type":"object","properties":{"content":{"type":"string"},"role":{"type":"string"},"type":{"type":"string"}},"required":["content","role","type"]}}}},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"numURLs":{"type":"integer"},"object":{"type":"string"},"readURLs":{"type":"array","items":{"type":"string"}},"request_id":{"type":"string"},"system_fingerprint":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"prompt_tokens":{"type":"integer"},"total_tokens":{"type":"integer"}},"required":["completion_tokens","prompt_tokens","total_tokens"]},"visitedURLs":{"type":"array","items":{"type":"string"}}},"required":["choices","created","id","model","numURLs","object","readURLs","request_id","system_fingerprint","usage","visitedURLs"]}}},"headers":{}}}}}}}
```


# Firecrawl

## POST /v1/firecrawl

> firecrawl/firecrawl-search

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Firecrawl"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/firecrawl":{"post":{"summary":"firecrawl/firecrawl-search","deprecated":false,"description":"","tags":["Firecrawl"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"query":{"type":"string"},"limit":{"type":"integer","description":"Maximum number of results to return","minimum":1,"maximum":100,"default":5},"sources":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"}}},"description":"Sources to search. Will determine the arrays available in the response."},"categories":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"}}},"description":"Categories to filter results by"},"tbs":{"type":"string","description":"Time-based search parameter. Supports predefined time ranges (`qdr:h`, `qdr:d`, `qdr:w`, `qdr:m`, `qdr:y`) and custom date ranges (`cdr:1,cd_min:MM/DD/YYYY,cd_max:MM/DD/YYYY`)"},"location":{"type":"string","description":"Location parameter for search results (e.g. `San Francisco,California,United States`). For best results, set both this and the `country` parameter."},"country":{"type":"string","description":"ISO country code for geo-targeting search results (e.g. `US`). For best results, set both this and the location parameter.","default":"US"},"timeout":{"type":"integer","description":"Timeout in milliseconds","default":60000},"ignoreInvalidURLs":{"type":"boolean","description":"Excludes URLs from the search results that are invalid for other Firecrawl endpoints. This helps reduce errors if you are piping data from search into other Firecrawl API endpoints.","default":false},"scrapeOptions":{"type":"object","properties":{"formats":{"type":"array","items":{"type":"string"}},"onlyMainContent":{"type":"boolean"},"includeTags":{"type":"array","items":{"type":"string"}},"excludeTags":{"type":"array","items":{"type":"string"}},"maxAge":{"type":"integer"},"headers":{"type":"object","properties":{}},"waitFor":{"type":"integer"},"mobile":{"type":"boolean"},"skipTlsVerification":{"type":"boolean"},"timeout":{"type":"integer"},"parsers":{"type":"array","items":{"type":"string"}},"actions":{"type":"array","items":{"type":"object","properties":{"type":{"type":"string"},"milliseconds":{"type":"integer"},"selector":{"type":"string"}}}},"location":{"type":"object","properties":{"country":{"type":"string"},"languages":{"type":"array","items":{"type":"string"}}},"required":["country","languages"]},"removeBase64Images":{"type":"boolean"},"blockAds":{"type":"boolean"},"proxy":{"type":"string"},"storeInCache":{"type":"boolean"}},"description":"Options for scraping search results"}},"required":["model","query"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"success":{"type":"boolean"},"data":{"type":"object","properties":{"web":{"type":"array","items":{"type":"object","properties":{"url":{"type":"string"},"title":{"type":"string"},"description":{"type":"string"},"position":{"type":"integer"},"category":{"type":"string"},"markdown":{"type":"string"},"metadata":{"type":"object","properties":{"viewport":{"type":"string"},"title":{"type":"string"},"referrer":{"type":"string"},"scrapeId":{"type":"string"},"sourceURL":{"type":"string"},"url":{"type":"string"},"statusCode":{"type":"integer"},"error":{"type":"string"},"contentType":{"type":"string"},"proxyUsed":{"type":"string"},"cacheState":{"type":"string"},"cachedAt":{"type":"string"},"creditsUsed":{"type":"integer"},"octolytics-dimension-user_login":{"type":"string"},"browser-stats-url":{"type":"string"},"visitor-hmac":{"type":"string"},"og:description":{"type":"string"},"og:title":{"type":"string"},"hovercard-subject-tag":{"type":"string"},"octolytics-dimension-repository_network_root_id":{"type":"string"},"fb:app_id":{"type":"string"},"theme-color":{"type":"string"},"ogTitle":{"type":"string"},"fetch-nonce":{"type":"string"},"octolytics-dimension-repository_network_root_nwo":{"type":"string"},"release":{"type":"string"},"og:site_name":{"type":"string"},"og:type":{"type":"string"},"browser-errors-url":{"type":"string"},"color-scheme":{"type":"string"},"ui-target":{"type":"string"},"octolytics-dimension-repository_nwo":{"type":"string"},"og:url":{"type":"string"},"octolytics-dimension-repository_is_fork":{"type":"string"},"visitor-payload":{"type":"string"},"og:image:width":{"type":"string"},"expected-hostname":{"type":"string"},"github-keyboard-shortcuts":{"type":"string"},"ogUrl":{"type":"string"},"apple-itunes-app":{"type":"string"},"octolytics-url":{"type":"string"},"turbo-body-classes":{"type":"string"},"octolytics-dimension-repository_public":{"type":"string"},"ogSiteName":{"type":"string"},"ogImage":{"type":"string"},"current-catalog-service-hash":{"type":"string"},"turbo-cache-control":{"type":"array","items":{"type":"string"}},"request-id":{"type":"string"},"language":{"type":"string"},"html-safe-nonce":{"type":"string"},"user-login":{"type":"string"},"og:image:alt":{"type":"string"},"route-action":{"type":"string"},"google-site-verification":{"type":"string"},"hostname":{"type":"string"},"disable-turbo":{"type":"string"},"description":{"type":"string"},"og:image:height":{"type":"string"},"ogDescription":{"type":"string"},"go-import":{"type":"string"},"og:image":{"type":"string"},"octolytics-dimension-repository_id":{"type":"string"},"twitter:image":{"type":"string"},"twitter:site":{"type":"string"},"twitter:card":{"type":"string"},"route-controller":{"type":"string"},"twitter:description":{"type":"string"},"octolytics-dimension-user_id":{"type":"string"},"twitter:title":{"type":"string"},"analytics-location":{"type":"string"},"route-pattern":{"type":"string"},"favicon":{"type":"string"}},"required":["referrer","title","viewport","scrapeId","sourceURL","url","statusCode","error","contentType","proxyUsed","cacheState","cachedAt","creditsUsed"]}},"required":["url","title","description","position","category","markdown","metadata"]}}},"required":["web"]},"creditsUsed":{"type":"integer"},"id":{"type":"string"}},"required":["success","data","creditsUsed","id"]}}},"headers":{}}}}}}}
```


# Perplexity

## POST /v1/perplexity

> perplexity/perplexity-search

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Perplexity"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/perplexity":{"post":{"summary":"perplexity/perplexity-search","deprecated":false,"description":"","tags":["Perplexity"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"query":{"type":"string","description":"The search query or queries to execute.\nA search query. Can be a single query or a list of queries for multi-query search."},"max_results":{"type":"integer","description":"The maximum number of search results to return.","default":10,"minimum":1,"maximum":20},"search_domain_filter":{"type":"array","items":{"type":"string"},"description":"A list of domains/URLs to limit search results to. Maximum 20 domains.","maxItems":20},"max_tokens_per_page":{"type":"integer","default":1024,"description":"Controls the maximum number of tokens retrieved from each webpage during search processing. Higher values provide more comprehensive content extraction but may increase processing time."},"country":{"type":"string","description":"Country code to filter search results by geographic location (e.g., 'US', 'GB', 'DE')."},"search_recency_filter":{"type":"string","description":"Filters search results based on recency. Specify 'day' for results from the past 24 hours, 'week' for the past 7 days, 'month' for the past 30 days, or 'year' for the past 365 days.","enum":["day","week","month","year "]},"search_after_date":{"type":"string","description":"Filters search results to only include content published after this date. Format should be %m/%d/%Y (e.g., '10/15/2025')."},"search_before_date":{"type":"string","description":"Filters search results to only include content published before this date. Format should be %m/%d/%Y (e.g., '10/16/2025')."}},"required":["model","query"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"results":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"url":{"type":"string"},"snippet":{"type":"string"},"date":{"type":"string"},"last_updated":{"type":"string"}},"required":["title","url","snippet","date","last_updated"]}},"id":{"type":"string"}},"required":["results","id"]}}},"headers":{}}}}}}}
```


# Exa

## POST /v1/exa

> exa/exa-search

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Exa"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/exa":{"post":{"summary":"exa/exa-search","deprecated":false,"description":"","tags":["Exa"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"query":{"type":"string"},"additionalQueries":{"type":"array","items":{"type":"string"}},"type":{"type":"string"},"category":{"type":"string"},"userLocation":{"type":"string"},"numResults":{"type":"integer"},"includeDomains":{"type":"array","items":{"type":"string"}},"excludeDomains":{"type":"array","items":{"type":"string"}},"startCrawlDate":{"type":"string"},"endCrawlDate":{"type":"string"},"startPublishedDate":{"type":"string"},"endPublishedDate":{"type":"string"},"includeText":{"type":"array","items":{"type":"string"}},"excludeText":{"type":"array","items":{"type":"string"}},"context":{"type":"boolean"},"moderation":{"type":"boolean"}},"required":["model","query"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"requestId":{"type":"string"},"resolvedSearchType":{"type":"string"},"results":{"type":"array","items":{"type":"object","properties":{"id":{"type":"string"},"title":{"type":"string"},"url":{"type":"string"},"publishedDate":{"type":"string"},"author":{"type":"null"}},"required":["id","title","url","publishedDate","author"]}},"searchTime":{"type":"number"},"costDollars":{"type":"object","properties":{"total":{"type":"number"},"search":{"type":"object","properties":{"neural":{"type":"number"}},"required":["neural"]}},"required":["total","search"]}},"required":["requestId","resolvedSearchType","results","searchTime","costDollars"]}}},"headers":{}}}}}}}
```


# Cloudsway

## POST /v1/cloudsway

> cloudsway/cloudsway-smart-search

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Cloudsway"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/cloudsway":{"post":{"summary":"cloudsway/cloudsway-smart-search","deprecated":false,"description":"","tags":["Cloudsway"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"q":{"type":"string","title":""},"count":{"type":"integer","description":"The number of search results to include in the returned results.","default":10,"enum":[10,20,30,40,50]},"offset":{"type":"integer","default":0},"freshness":{"type":"string","enum":["Day","Week","Month"]},"sites":{"type":"string"},"enableContent":{"type":"boolean","default":false},"contentType":{"type":"string","enum":["HTML","MARKDOWN","TEXT"],"default":"TEXT"},"contentTimeout":{"type":"number"},"mainText":{"type":"boolean","default":false}},"required":["model","q"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{}}}},"headers":{}}}}}}}
```


# Tavily

## tavily/tavily-extract

> Extract web page content from one or more specified URLs using Tavily Extract.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Tavily"}],"servers":[{"url":"https://search.onerouter.pro","description":"search"}],"security":[],"paths":{"/v1/tavily/extract":{"post":{"summary":"tavily/tavily-extract","deprecated":false,"description":"Extract web page content from one or more specified URLs using Tavily Extract.","tags":["Tavily"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"urls":{"type":"array","items":{"type":"string"},"description":"A list of URLs to extract content from."},"query":{"type":"string","description":"User intent for reranking extracted content chunks. When provided, chunks are reranked based on relevance to this query."},"chunks_per_source":{"type":"integer","description":"Chunks are short content snippets (maximum 500 characters each) pulled directly from the source. Use chunks_per_source to define the maximum number of relevant chunks returned per source and to control the raw_content length. Chunks will appear in the raw_content field as: <chunk 1> [...] <chunk 2> [...] <chunk 3>. Available only when query is provided. Must be between 1 and 5.","default":3,"minimum":1,"maximum":5},"extract_depth":{"type":"string","description":"The depth of the extraction process. advanced extraction retrieves more data, including tables and embedded content, with higher success but may increase latency.basic extraction costs 1 credit per 5 successful URL extractions, while advanced extraction costs 2 credits per 5 successful URL extractions.","enum":["basic","advanced"],"default":"basic"},"include_images":{"type":"boolean","description":"Include a list of images extracted from the URLs in the response. Default is false.","default":false},"include_favicon":{"type":"boolean","description":"Whether to include the favicon URL for each result.","default":false},"format":{"type":"string","description":"The format of the extracted web page content. markdown returns content in markdown format. text returns plain text and may increase latency.","enum":["markdown ","text"],"default":"markdown "},"timeout":{"type":"number","description":"Maximum time in seconds to wait for the URL extraction before timing out. Must be between 1.0 and 60.0 seconds. If not specified, default timeouts are applied based on extract_depth: 10 seconds for basic extraction and 30 seconds for advanced extraction.","minimum":1,"maximum":60,"format":"float","default":60},"include_usage":{"type":"boolean","description":"Whether to include credit usage information in the response. NOTE:The value may be 0 if the total successful URL extractions has not yet reached 5 calls.","default":false}},"required":["urls"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"results":{"type":"array","items":{"type":"object","properties":{"url":{"type":"string"},"title":{"type":"string"},"raw_content":{"type":"string"},"images":{"type":"array","items":{"type":"string"}}},"required":["url","title","raw_content","images"]}},"failed_results":{"type":"array","items":{"type":"string"}},"response_time":{"type":"number"},"request_id":{"type":"string"}},"required":["results","failed_results","response_time","request_id"]}}},"headers":{}},"400":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"error":{"type":"object","properties":{"message":{"type":"string"},"type":{"type":"string"},"param":{"type":"string"},"code":{"type":"integer"}},"required":["message","type","param","code"]}},"required":["error"]}}},"headers":{}}}}}}}
```


# Overview

### What Is Batch Processing?

Batch processing is a powerful method for handling large volumes of requests efficiently. Instead of sending and processing each request individually with an immediate response, batch processing lets you submit many requests together for asynchronous execution. This approach is especially useful when:

* You need to process large datasets
* Real-time responses are not required
* You want to maximize cost efficiency
* You are running large-scale evaluations or analyses

Batch processing (batching) enables you to send multiple message requests in a single batch and retrieve their results later (within up to 24 hours). The key benefits include significant cost reduction (up to 50%) and higher throughput for analytical or offline workloads.

### How to Use the Batches API

A Batch consists of a list of individual requests. Each request contains:

* A unique custom\_id to identify the message request
* A params object containing the standard parameters used in the Messages API

To create a batch, pass this list of requests into the requests parameter.

#### Create a message batch <a href="#create-a-message-batch" id="create-a-message-batch"></a>

> Create a batch of messages for asynchronous processing. All usage is charged at 50% of the standard API prices.

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

headers = {
    "Authorization": "Bearer <<API_KEY>>",
    "Content-Type": "application/json"
}

data = {
  "requests": [
    {
      "custom_id": "my-request-01",
      "params": {
        "model": "gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn nestjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": null,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    },
    {
      "custom_id": "my-request-02",
      "params": {
        "model": "gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn Reactjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": null,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    },
    {
      "custom_id": "my-request-03",
      "params": {
        "model": "gpt-4o-mini-batch",
        "max_tokens": 1024,
        "messages": [
          {
            "role": "user",
            "content": "How to learn Nextjs?"
          }
        ],
        "metadata": {
          "ANY_ADDITIONAL_PROPERTY": "text"
        },
        "stop_sequences": [
          "text"
        ],
        "system": "text",
        "temperature": 1,
        "tool_choice": null,
        "tools": [],
        "top_k": 1,
        "top_p": 1,
        "thinking": {
          "budget_tokens": 1024,
          "type": "enabled"
        }
      }
    }
  ]
}

response = requests.post("https://llm.onerouter.pro/v1/batches", headers=headers, data=json.dumps(data))

data = response.json()
print("Batch created:", json.dumps(data, indent=2, ensure_ascii=False))
```

{% endtab %}
{% endtabs %}

In this example, three separate requests are batched together for asynchronous processing. Each request has a unique `custom_id` and contains the standard parameters you'd use for a Messages API call.

```json
{
  'batch': {
    'cancelled_at': None,
    'cancelling_at': None,
    'completed_at': None,
    'completion_window': '24h',
    'created_at': 1765972352,
    'endpoint': '',
    'error_file_id': '',
    'errors': None,
    'expired_at': None,
    'expires_at': 1766058749,
    'failed_at': None,
    'finalizing_at': None,
    'id': 'batch_a34c321b-ed4b-4e91-ae29-7f02939d8962',
    'in_progress_at': None,
    'input_file_id': 'file-142b17fbff7d4a06a88ec9205ae143c9',
    'metadata': None,
    'object': 'batch',
    'output_file_id': '',
    'request_counts': {
      'completed': 0,
      'failed': 0,
      'total': 0
    },
    'status': 'validating'
  },
  'batch_id': 'batch_a34c321b-ed4b-4e91-ae29-7f02939d8962',
  'file': {
    'bytes': 802,
    'created_at': 1765972347,
    'filename': 'batch.jsonl',
    'id': 'file-142b17fbff7d4a06a88ec9205ae143c9',
    'object': 'file',
    'purpose': 'batch',
    'status': 'processed'
  },
  'file_id': 'file-142b17fbff7d4a06a88ec9205ae143c9',
  'task_id': 2,
  'task_status': 'NOT_START'
}
```

#### Get status or results of a specific message batch <a href="#get-status-or-results-of-a-specific-message-batch" id="get-status-or-results-of-a-specific-message-batch"></a>

> Get batch status if in progress, or stream results if completed in JSONL format.

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

# Insert your batch_id here
batch_id = "batch_a34c321b-ed4b-4e91-ae29-7f02939d8962"

headers = {
    "Authorization": "Bearer <<API_KEY>>",
    "Content-Type": "application/json"
}

response = requests.get("https://llm.onerouter.pro/v1/batches/{batch_id}", headers=headers)

print("Raw response:\n", response.text[:500])  

try:
    data = [json.loads(line) for line in response.text.splitlines() if line.strip()]
    print("\n✅ Parsed JSONL:")
    print(json.dumps(data, indent=2))
except json.JSONDecodeError:
    try:
        data = response.json()
        print("\n✅ Parsed JSON:")
        print(json.dumps(data, indent=2))
    except Exception as e:
        print("\n⚠️ Could not parse response:", e)
```

{% endtab %}
{% endtabs %}

#### Cancel a specific batch <a href="#cancel-a-specific-batch" id="cancel-a-specific-batch"></a>

You can cancel a Batch that is currently processing using the cancel endpoint. Immediately after cancellation, a batch's `processing_status` will be `canceling`. Canceled batches end up with a status of `ended` and may contain partial results for requests that were processed before cancellation.

{% tabs %}
{% tab title="Python" %}

```python
import requests 
import json

batch_id = "batch_a34c321b-ed4b-4e91-ae29-7f02939d8962"
headers = { 
    "Authorization": "Bearer <<API_KEY>>", 
    "Content-Type": "application/json" 
}

response = requests.post(
    f"https://llm.onerouter.pro/v1/batches/{batch_id}/cancel", 
    headers=headers
)
if response.status_code == 200: 
    print("Batch canceled successfully:") 
    data = response.json() 
    print(json.dumps(data, indent=2, ensure_ascii=False)) 
else: 
    print(f"Failed to cancel batch ({response.status_code}):") 
    data = response.json() 
    print(json.dumps(data, indent=2, ensure_ascii=False))
```

{% endtab %}
{% endtabs %}


# Create New Batch

## Create a message batch

> Create a batch of messages for asynchronous processing. All usage is charged at 50% of the standard API prices.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Create New Batch"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Pub Env"}],"security":[],"paths":{"/v1/batches":{"post":{"summary":"Create a message batch","deprecated":false,"description":"Create a batch of messages for asynchronous processing. All usage is charged at 50% of the standard API prices.","tags":["Create New Batch"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"requests":{"type":"array","items":{"type":"object","properties":{"custom_id":{"type":"string"},"params":{"type":"object","properties":{"model":{"type":"string"},"max_tokens":{"type":"integer"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}},"required":["role","content"]}},"metadata":{"type":"object","properties":{"ANY_ADDITIONAL_PROPERTY":{"type":"string"}},"required":["ANY_ADDITIONAL_PROPERTY"]},"stop_sequences":{"type":"array","items":{"type":"string"}},"system":{"type":"string"},"temperature":{"type":"integer"},"tool_choice":{"type":"null"},"tools":{"type":"array","items":{"type":"string"}},"top_k":{"type":"integer"},"top_p":{"type":"integer"},"thinking":{"type":"object","properties":{"budget_tokens":{"type":"integer"},"type":{"type":"string"}},"required":["budget_tokens","type"]}},"required":["model","max_tokens","messages","metadata","stop_sequences","system","temperature","tool_choice","tools","top_k","top_p","thinking"]}},"required":["custom_id","params"]}}},"required":["requests"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"id":{"type":"string"},"batch_id":{"type":"string"},"object":{"type":"string"},"endpoint":{"type":"string"},"errors":{"type":"null"},"input_file_id":{"type":"string"},"completion_window":{"type":"string"},"status":{"type":"string"},"output_file_id":{"type":"null"},"error_file_id":{"type":"null"},"created_at":{"type":"integer"},"in_progress_at":{"type":"null"},"expires_at":{"type":"null"},"finalizing_at":{"type":"null"},"completed_at":{"type":"null"},"failed_at":{"type":"null"},"expired_at":{"type":"null"},"cancelling_at":{"type":"null"},"cancelled_at":{"type":"null"},"request_counts":{"type":"object","properties":{"total":{"type":"integer"},"completed":{"type":"integer"},"failed":{"type":"integer"}},"required":["total","completed","failed"]},"metadata":{"type":"null"}},"required":["id","batch_id","object","endpoint","errors","input_file_id","completion_window","status","output_file_id","error_file_id","created_at","in_progress_at","expires_at","finalizing_at","completed_at","failed_at","expired_at","cancelling_at","cancelled_at","request_counts","metadata"]}}},"headers":{}}}}}}}
```


# Get Status of a Batch

## Get batch status or results

> Get batch status if in progress, or stream results if completed in JSONL format.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Get Status of a Batch"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Pub Env"}],"security":[],"paths":{"/v1/batches/{batch_id}":{"get":{"summary":"Get batch status or results","deprecated":false,"description":"Get batch status if in progress, or stream results if completed in JSONL format.","tags":["Get Status of a Batch"],"parameters":[{"name":"batch_id","in":"path","description":"","required":true,"schema":{"type":"string"}},{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"batch_id":{"type":"string"},"batch_status":{"type":"string"},"completed_count":{"type":"integer"},"completion_tokens":{"type":"integer"},"cost":{"type":"string"},"error":{"type":"string"},"fail_reason":{"type":"string"},"failed_count":{"type":"integer"},"finish_time":{"type":"integer"},"model":{"type":"string"},"progress":{"type":"string"},"prompt_tokens":{"type":"integer"},"result_url":{"type":"string"},"start_time":{"type":"integer"},"status":{"type":"string"},"submit_time":{"type":"integer"},"task_id":{"type":"integer"},"total_requests":{"type":"integer"},"total_tokens":{"type":"integer"}},"required":["batch_id","batch_status","completed_count","completion_tokens","cost","error","fail_reason","failed_count","finish_time","model","progress","prompt_tokens","result_url","start_time","status","submit_time","task_id","total_requests","total_tokens"]}}},"headers":{}}}}}}}
```


# Cancel a Batch

## POST /v1/batches/{batch\_id}/cancel

> Cancel a specific batch

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Cancel a Batch"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Pub Env"}],"security":[],"paths":{"/v1/batches/{batch_id}/cancel":{"post":{"summary":"Cancel a specific batch","deprecated":false,"description":"","tags":["Cancel a Batch"],"parameters":[{"name":"batch_id","in":"path","description":"","required":true,"schema":{"type":"string"}},{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{}}}},"headers":{}}}}}}}
```


# Get remaining credits

## Get remaining credits

> This endpoint facilitates queries for user information, primarily focusing on their account balance.

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Get remaining credits"}],"servers":[{"url":"https://api.onerouter.pro","description":"API"}],"security":[],"paths":{"/v1/balance":{"get":{"summary":"Get remaining credits","deprecated":false,"description":"This endpoint facilitates queries for user information, primarily focusing on their account balance.","tags":["Get remaining credits"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"account_name":{"type":"string"},"credit_balance":{"type":"number"}},"required":["account_name","credit_balance"]}}},"headers":{}}}}}}}
```


# Overview

Track AI Model Token Usage and Cost Breakdowns

Infron provides built‑in Usage Accounting that allows you to monitor AI model usage and cost breakdowns directly from your API responses. This feature includes detailed insights into token consumption, associated costs, and caching behavior.

**Benefits**&#x20;

* Efficiency: Retrieve usage information without additional API calls&#x20;
* Accuracy: Token counts are computed using each model’s native tokenizer&#x20;
* Transparency: Track real-time cost and cached token utilization&#x20;
* Detailed Breakdown: Separate reporting for prompt, completion, reasoning, and cached tokens

**Usage Information**&#x20;

When enabled, the API returns comprehensive usage metrics, including:

* Prompt and completion token counts calculated with the model’s native tokenizer&#x20;
* Total cost in credits
* Reasoning token counts (when supported by the model)&#x20;
* Cached token counts (when applicable)

This usage information appears in the final SSE message for streaming responses, or in the full response body for non‑streaming requests.


# Get cost & usage details (non streaming)

## Get cost & usage details (non streaming)

> Get cost details and usage details in every call

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Get cost & usage details (non streaming)"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Chat"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Get cost & usage details (non streaming)","deprecated":false,"description":"Get cost details and usage details in every call","tags":["Get cost & usage details (non streaming)"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"usage":{"type":"object","properties":{"include":{"type":"boolean","default":false}},"required":["include"]}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"object","properties":{"finish_reason":{"type":"string"},"index":{"type":"integer"},"logprobs":{"type":"null"},"message":{"type":"object","properties":{"content":{"type":"string"},"role":{"type":"string"}},"required":["content","role"]}}}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"number"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"request_id":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"completion_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"reasoning_tokens":{"type":"integer"}},"required":["audio_tokens","image_tokens","reasoning_tokens"]},"prompt_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"cached_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["audio_tokens","cache_write_tokens","cached_tokens","video_tokens"]},"total_tokens":{"type":"integer"}},"required":["completion_tokens","completion_tokens_details","prompt_tokens","prompt_tokens_details","total_tokens"]}},"required":["choices","cost","cost_details","created","id","model","object","request_id","usage"]}}},"headers":{}}}}}}}
```


# Get cost & usage details (streaming)

## Get cost & usage details (streaming)

> Get cost details and usage details in every call

```json
{"openapi":"3.1.0","info":{"title":"Default module","version":"1.0.0"},"tags":[{"name":"Get cost & usage details (streaming)"}],"servers":[{"url":"https://llm.onerouter.pro","description":"Chat"}],"security":[],"paths":{"/v1/chat/completions":{"post":{"summary":"Get cost & usage details (streaming)","deprecated":false,"description":"Get cost details and usage details in every call","tags":["Get cost & usage details (streaming)"],"parameters":[{"name":"Authorization","in":"header","description":"","required":true,"schema":{"type":"string"}}],"requestBody":{"content":{"application/json":{"schema":{"type":"object","properties":{"model":{"type":"string"},"messages":{"type":"array","items":{"type":"object","properties":{"role":{"type":"string"},"content":{"type":"string"}}}},"usage":{"type":"object","properties":{"include":{"type":"boolean","default":false}},"required":["include"]},"stream":{"type":"boolean","default":false}},"required":["model","messages"]}}}},"responses":{"200":{"description":"","content":{"application/json":{"schema":{"type":"object","properties":{"choices":{"type":"array","items":{"type":"string"}},"cost":{"type":"number"},"cost_details":{"type":"object","properties":{"audio_cost":{"type":"integer"},"byok_cost":{"type":"integer"},"completion_cost":{"type":"number"},"discount_rate":{"type":"integer"},"image_cost":{"type":"integer"},"is_byok":{"type":"boolean"},"native_web_search_cost":{"type":"integer"},"plugin_web_search_cost":{"type":"integer"},"prompt_cache_read_cost":{"type":"number"},"prompt_cache_write_1_h":{"type":"integer"},"prompt_cache_write_5_min":{"type":"integer"},"prompt_cache_write_cost":{"type":"integer"},"prompt_cost":{"type":"number"},"reasoning_cost":{"type":"integer"},"tools_cost":{"type":"integer"},"video_cost":{"type":"integer"}},"required":["audio_cost","byok_cost","completion_cost","discount_rate","image_cost","is_byok","native_web_search_cost","plugin_web_search_cost","prompt_cache_read_cost","prompt_cache_write_1_h","prompt_cache_write_5_min","prompt_cache_write_cost","prompt_cost","reasoning_cost","tools_cost","video_cost"]},"created":{"type":"integer"},"id":{"type":"string"},"model":{"type":"string"},"object":{"type":"string"},"request_id":{"type":"string"},"system_fingerprint":{"type":"string"},"usage":{"type":"object","properties":{"completion_tokens":{"type":"integer"},"completion_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"image_tokens":{"type":"integer"},"reasoning_tokens":{"type":"integer"}},"required":["audio_tokens","image_tokens","reasoning_tokens"]},"prompt_tokens":{"type":"integer"},"prompt_tokens_details":{"type":"object","properties":{"audio_tokens":{"type":"integer"},"cache_write_tokens":{"type":"integer"},"cached_tokens":{"type":"integer"},"video_tokens":{"type":"integer"}},"required":["audio_tokens","cache_write_tokens","cached_tokens","video_tokens"]},"total_tokens":{"type":"integer"}},"required":["completion_tokens","completion_tokens_details","prompt_tokens","prompt_tokens_details","total_tokens"]}},"required":["choices","cost","cost_details","created","id","model","object","request_id","system_fingerprint","usage"]}}},"headers":{}}}}}}}
```


---

[Next Page](/docs/llms-full.txt/1)

Text Generation Generate and stream text with GPT-5.2, Claude Sonnet 4.6, Gemini 3 Flash, Llama 4, and 300+ more models. AI SDK · OpenAI-compatible · Anthropic-compatible · OpenAI-Responses	/pages/HYwJ9ECRWrwWZf9Iv4cb
Image Generation Create images from text prompts or edit existing images with Flux 2 Flex, Recraft V3, Imagen, and more. AI SDK · OpenAI-compatible	/pages/1bAdbFWqJy8GTNfKwF8p
Video Generation Create videos from text prompts, images, or video input with Veo 3.1, KlingAI, Wan, Grok Imagine Video, and more.	/pages/CqjyssSCfkdus5B6pIcG
Audio Generation Create audio from text prompts, images, or video input with gpt-4o-mini-tts、tts-1, and more.	/pages/cvdJ3cyZw0OlUuJzh4CZ
Search, Deepsearch & Extract generate search from text prompts with tavily, exa, jina, perpexity, and more.	/pages/1fuCFLPtcMUjVcHlNnFR
Embedding & Reranker generate embedding and reranking from text prompts with gpt, qwen, and more.	/pages/dPJ3lH8ZZVPUTMqlzPTQ
Batch Generation Create batch completions from text prompts with AWS, Google, Azure, and more.	/pages/sQF9BTt8eG2vZIQoEGf5
Field	Type	Default	Description
`order`	string[]	-	List of provider slugs to try in order (e.g. `["anthropic", "openai"]`).
`allow_fallbacks`	boolean	`true`	Whether to allow backup providers when the primary is unavailable.
`sort`	string \| object	-	Sort providers by price, throughput, or latency. (e.g. `"price"`)
`preferred_min_throughput`	number \| object	-	Preferred minimum throughput (tokens/sec). Can be a number or an object with percentile cutoffs (p50, p75, p90, p99).
`preferred_max_latency`	number \| object	-	Preferred maximum latency (seconds). Can be a number or an object with percentile cutoffs (p50, p75, p90, p99).
`require_parameters`	boolean	`true`	Only use providers that support all parameters in your request.
`data_collection`	"allow" \| "deny"	"allow"	Control whether to use providers that may store data.
`zdr`	boolean	`false`	Restrict routing to only ZDR (Zero Data Retention) endpoints.
`enforce_distillable_text`	boolean	`false`	Restrict routing to only models that allow text distillation.
`only`	string[]	-	List of provider slugs to allow for this request.
`ignore`	string[]	-	List of provider slugs to skip for this request.
`quantizations`	string[]	-	List of quantization levels to filter by (e.g. `["int4", "int8"]`).
Caching Type	Usage Method
Implicit Caching	No configuration needed, `automatically managed by model provider`
Explicit Caching	Requires `cache_control` parameter
Model Series	Minimum Cache Tokens
Claude Opus 4.1/4	1024 tokens
Claude Haiku 3.5	2048 tokens
Sonnet 4.5/4/3.7	1024 tokens
Changed Content	Tool Cache	System Cache	Message Cache	Impact Description
Tool Definitions	✘	✘	✘	Modifying tool definitions invalidates entire cache
System Prompt	✓	✘	✘	Modifying system prompt invalidates system and message cache
tool_choice Parameter	✓	✓	✘	Only affects message cache
Add/Remove Images	✓	✓	✘	Only affects message cache
Parameter	Type	Required	Description
`web_search_options`	object	No	Web search configuration
`web_search_options.search_context_size`	string	No	Search context size: `low` `medium` `high`
`web_search_options.user_location`	object	No	User location info for localized search results
`web_search_options.user_location.type`	string	Yes	Location type, fixed as `approximate`
`web_search_options.user_location.city`	string	No	City name
`web_search_options.user_location.country`	string	No	Country code (2-letter ISO, e.g. `CN`, `US`)
`web_search_options.user_location.region`	string	No	Region/province
`web_search_options.user_location.timezone`	string	No	Timezone (IANA format, e.g. `Asia/Shanghai`)
Type	Formats
Images	`JPG`, `JPEG`, `PNG`, `WebP`, `GIF`, `BMP`, `TIFF`
Videos	`MP4`, `AVI`, `MOV`, `WMV`, `FLV`, `WebM`, `MKV`, `3GP`, `OGV`
Audio	`MP3`, `WAV`, `OGG`, `AAC`, `FLAC`, `WebM`, `M4A`, `Opus`
Field	Required	Type	Default	Description
model	`True`	string	-	Model id
file	`True`	file/string	-	`Local file stream` or `remote resource URL` (only http/https supported).
Field	Required	Type	Default	Description
page	`False`	number	`1`	Page index
page_size	`False`	number	`20`	Number of items per page, maximum 100
Element	Description	Example
Subject	What appears in the video	`a young woman`, `a robot`, `a city skyline`
Context	The setting or background	`on a rainy Tokyo street at night`
Action	What is happening	`walking slowly while holding an umbrella`
Style	Visual or cinematic style	`cinematic`, `documentary`, `noir`, `cartoon`
Camera Motion	Optional camera movement	`tracking shot`, `aerial view`, `slow dolly-in`
Composition	Framing or shot type	`wide shot`, `close-up`, `over-the-shoulder`
Ambiance	Lighting, color, sound, mood	`soft golden light`, `ambient city noise`