# Model Fallbacks

### Infron **Smart Routing — Intelligent Optimization for Global AI API Access**

**Infron AI Smart Routing** acts as an intelligent router between your application and global AI API providers. Its architecture consists of two key modules:

1. **Global Health Monitoring Module**\
   This module continuously monitors the health status of AI API providers across different regions and time zones. By collecting real-time data on availability and performance, Infron AI maintains a global view of each provider’s stability and service quality.
2. **Reinforcement Learning‑Driven Routing Engine**\
   Based on historical conversion data and the real‑time health metrics of various AI API providers, this engine evaluates multiple factors such as price, TPM (tokens per minute), RPM (requests per minute), and latency. Using reinforcement learning, it automatically generates an optimized candidate routing list every five minutes.

When **Smart Routing** is enabled, Infron dynamically directs each request to the most cost‑effective and stable model, ensuring users get the best balance between performance and budget.

By acting as a flexible, intelligent scheduling layer between clients and AI service providers, Infron AI can help businesses reduce operational costs by up to **90%** while significantly improving overall performance and reliability.

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FN2k9nQdVI4ab2zlOBeWL%2Fzxo6ArqyGw0VYeRehsP8s.png?alt=media&#x26;token=897e0fe9-93fd-4aec-b390-0e1103aa0df7" alt=""><figcaption></figcaption></figure>

### How It Works <a href="#how-it-works" id="how-it-works"></a>

The Model Routing & Fallbacks feature lets you automatically try other models if the primary model’s providers are down, rate-limited, or refuse to reply due to content moderation.

#### fallback\_models

The `fallback_models` parameter lets you automatically try other models if the primary model's providers are

* **URL Endpoint is down**: eg. 400/500/504/503/508/524 error code.
* **Streaming conversations are lagging**: Latency (E2E) suddenly increased abnormally, while TPM (Transactions Per Minute) unexpectedly dropped.
* **Refuse to reply due to content moderation**
* **Validation errors**: e.g. Invalid parameters input, context length validation errors

```json
{
  "model": "gemini-2.5-flash",
  "fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
  "fallback_rules": "auto"  // default value is "auto"
  ... // Other params
}
```

{% hint style="info" %}
**model**: The primary model.

**fallback\_models**: The fallback model list.

**fallback\_rules**: Rules for determining whether to trigger model fallback, default value is "auto".
{% endhint %}

If the `fallback_rules` parameter is set to "`auto`,"   `""`, or if the parameter `isn't passed at all`, Infron AI will automatically calculate baseline metrics based on your historical data and continuously make dynamic decisions about whether model fallback is needed.

#### fallback\_rules

If you need more granular control over your model fallback switching strategy or want to create a strategy that better fits your business needs, you can explicitly specify the `fallback_rules` parameter in your input.

```json
{
  "model": "gemini-2.5-flash",
  "fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
  "fallback_rules": {
    "Error_code": [400, 500, 504, 503, 508, 524],
    "Latency_threshold": 500,
    "TTFT_threshold": 1000,
    "TPM_threshold": 100,
    "RPM_threshold": 100
  }
}
```

### Fallback Behavior <a href="#fallback-behavior" id="fallback-behavior"></a>

If the model you selected returns an error, Infron AI will try to use the fallback model instead.&#x20;

By default, any error can trigger the use of a fallback model, including:

* Context length validation errors
* Moderation flags for filtered models
* Rate-limiting
* Downtime
* Streaming conversations are lagging

If the fallback model is down or returns an error, Infron AI will return that error.

### Pricing <a href="#pricing" id="pricing"></a>

Requests are priced using the model that was ultimately used, which will be returned in the `model` attribute of the response body.

#### Using with OpenAI SDK

To use the `fallback_models` and `fallback_rules` with the OpenAI SDK, include it in the `extra_body` parameter. In the example below, `gemini-2.5-flash` will be tried first, and the `fallback_models` array will be tried in order as fallbacks.

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

openai_client = OpenAI(
  base_url="https://llm.onerouter.prp/v1",
  api_key={{API_KEY}},
)

completion = openai_client.chat.completions.create(
    model="gemini-2.5-flash",
    extra_body={
        "fallback_models": ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
        "fallback_rules": "auto"
    },
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life?"
        }
    ]
)

print(completion.choices[0].message.content)
```

{% endtab %}

{% tab title="TypeScript" %}

```typescript
import OpenAI from 'openai';

const onerouterClient = new OpenAI({
  baseURL: 'https://llm.onerouter.prp/v1',
  // API key and headers
});

async function main() {
  // @ts-expect-error
  const completion = await onerouterClient.chat.completions.create({
    model: 'gemini-2.5-flash',
    fallback_models: ["gemini-2.5-flash", "grok-4-fast-non-reasoning", "qwen3-next-80b-a3b-instruct"],
    fallback_rules: "auto",
    messages: [
      {
        role: 'user',
        content: 'What is the meaning of life?',
      },
    ],
  });
  console.log(completion.choices[0].message);
}

main();
```

{% endtab %}
{% endtabs %}
