# Inference Provider Routing

Infron AI routes requests to the best available providers for your model.&#x20;

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FGUAhTbG6QMsFjS7KipoZ%2Fimage.png?alt=media&#x26;token=ff9ef71f-fb47-4685-94bd-dac05f276d59" alt=""><figcaption></figcaption></figure>

**By default, requests are load-balanced across the top providers to maximize uptime and best** **price**.

You can customize how your requests are routed using the `provider` object in the request body for Chat Completions and Completions.

The `provider` object can contain the following fields:

<table><thead><tr><th width="248.0242919921875">Field</th><th width="166">Type</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td><a href="#ordering-specific-providers-order"><code>order</code></a></td><td>string[]</td><td>-</td><td>List of provider slugs to try in order (e.g. <code>["anthropic", "openai"]</code>). </td></tr><tr><td><a href="#disabling-fallbacks-allow_fallbacks"><code>allow_fallbacks</code></a></td><td>boolean</td><td><code>true</code></td><td>Whether to allow backup providers when the primary is unavailable.</td></tr><tr><td><a href="#provider-sorting-sort"><code>sort</code></a></td><td>string | object</td><td>-</td><td>Sort providers by price, throughput, or latency. (e.g. <code>"price"</code>)</td></tr><tr><td><a href="#performance-thresholds-preferred_min_throughput-preferred_max_latency"><code>preferred_min_throughput</code></a></td><td>number | object</td><td>-</td><td>Preferred minimum throughput (tokens/sec). Can be a number or an object with percentile cutoffs (p50, p75, p90, p99).</td></tr><tr><td><a href="#performance-thresholds-preferred_min_throughput-preferred_max_latency"><code>preferred_max_latency</code></a></td><td>number | object</td><td>-</td><td>Preferred maximum latency (seconds). Can be a number or an object with percentile cutoffs (p50, p75, p90, p99). </td></tr><tr><td><a href="#requiring-providers-to-support-all-parameters-require_parameters"><code>require_parameters</code></a></td><td>boolean</td><td><code>true</code></td><td>Only use providers that support all parameters in your request. </td></tr><tr><td><a href="#requiring-providers-to-comply-with-data-policies-data_collection"><code>data_collection</code></a></td><td>"allow" | "deny"</td><td>"allow"</td><td>Control whether to use providers that may store data.</td></tr><tr><td><a href="#zero-data-retention-enforcement-zdr"><code>zdr</code></a></td><td>boolean</td><td><code>false</code></td><td>Restrict routing to only ZDR (Zero Data Retention) endpoints. </td></tr><tr><td><a href="#distillable-text-enforcement-enforce_distillable_text"><code>enforce_distillable_text</code></a></td><td>boolean</td><td><code>false</code></td><td>Restrict routing to only models that allow text distillation.</td></tr><tr><td><a href="#allowing-only-specific-providers-only"><code>only</code></a></td><td>string[]</td><td>-</td><td>List of provider slugs to allow for this request. </td></tr><tr><td><a href="#ignoring-providers-ignore"><code>ignore</code></a></td><td>string[]</td><td>-</td><td>List of provider slugs to skip for this request. </td></tr><tr><td><a href="#quantization-quantizations"><code>quantizations</code></a></td><td>string[]</td><td>-</td><td>List of quantization levels to filter by (e.g. <code>["int4", "int8"]</code>).</td></tr></tbody></table>

### **Cost-effective** Load Balancing (Default Strategy)

For each model in your request, Infron's default behavior is to load balance requests across providers, **balancing the best throughput, lowest latency, and lowest price**.

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FTxgEmjzoQLHUZxdarIlr%2Fimage.png?alt=media&#x26;token=d7e16c2c-ff63-4894-8cb4-bfff9664dd52" alt=""><figcaption></figcaption></figure>

When you send a model request, **Infron automatically evaluates multiple providers in real time**. It considers factors such as **latency**, **throughput**, **reliability**, and **price**—based on the default weight distribution shown above.&#x20;

{% hint style="info" %}
For instance, if Provider A offers slightly higher throughput but at a higher cost, while Provider B is more affordable with moderate latency, Infron will intelligently balance requests across both to achieve the best overall performance and cost efficiency.
{% endhint %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FQFIjxZw9qeLB5wpWI45J%2Fimage.png?alt=media&#x26;token=2f226b65-aa0e-4071-9c89-3e77619fc746" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
If you are more sensitive to throughput than price, you can use the [sort](#provider-sorting-sort) field to explicitly prioritize throughput.

If you have `sort` or `order` set in your provider preferences, load balancing default strategy will be disabled.
{% endhint %}

### Ordering Specific Providers (order)

You can set the providers that Infron AI will prioritize for your request using the `order` field.

| Field   | Type      | Default | Description                                                              |
| ------- | --------- | ------- | ------------------------------------------------------------------------ |
| `order` | string\[] | -       | List of provider slugs to try in order (e.g. `["anthropic", "openai"]`). |

Infron AI will prioritize providers in this order, for the model you're using. If you don't set this field, the router will use the [default strategy](#cost-effective-load-balancing-default-strategy).

You can use the copy button next to provider names on model pages to get the exact provider slug, for example like "`anthropic`"、"`openai`"、“`novita`”

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FureiLLGBPA5SO0urd4HR%2Fimage.png?alt=media&#x26;token=fe89c707-5bdc-4ff3-af63-6aabd0384ca9" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
**Order example (enable allow\_fallbacks by default)**:

* `azure` is hosting the "`anthropic/claude-sonnet-4.5`"
* `anthropic` is hosting the "`anthropic/claude-sonnet-4.5`"
* `openai` is hosting the "`anthropic/claude-sonnet-4.5`"

You set the `order` filed as `["anthropic", "openai"]`, and you're calling the "`anthropic/claude-sonnet-4.5`" model.

* If Provider `anthropic` fails, then Provider `openai` will be tried next.
* If Provider `openai` also fails, then `backup provider` (may be azure) will be tried last.
  {% endhint %}

Infron will try all the providers which are specified in `order` one at a time, and proceed to other `backup providers` if none are operational.&#x20;

If you don't want to allow any other providers, you should disable `allow_fallbacks` as well.

{% hint style="info" %}
**Order example (disable allow\_fallbacks by default)**:

* `azure` is hosting the "`anthropic/claude-sonnet-4.5`"
* `anthropic` is hosting the "`anthropic/claude-sonnet-4.5`"
* `openai` is hosting the "`anthropic/claude-sonnet-4.5`"

You set the `order` filed as `["anthropic", "openai"]`, and you're calling the "`anthropic/claude-sonnet-4.5`" model.&#x20;

You set the `allow_fallbacks` as `false`.

* If Provider `anthropic` fails, then Provider `openai` will be tried next.
* If Provider `openai` also fails, then this request will `fails` finally.
  {% endhint %}

#### Example: Specifying providers with fallbacks

In the example below, your request will first be sent to Google AI Studio, and only when Google AI Studio experiences a serious outage will the request be forwarded to Google Vertex.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['novita', 'deepinfra'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "order": ["novita", "deepinfra"]
  }
}'
```

{% endtab %}
{% endtabs %}

#### Example: Specifying providers with fallbacks disabled

Here's an example with `allow_fallbacks` set to `false`,your request will first be sent to Google AI Studio, and then fails if Google AI Studio fails

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'google/gemini-3-flash-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['google-ai-studio'],
    'allow_fallbacks': False
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "google/gemini-3-flash-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "order": ["google-ai-studio"],
      "allow_fallbacks": false
  }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FXVyXf4AUEUMDf2rF6RPl%2Fimage.png?alt=media&#x26;token=0dbc800f-5581-46f1-8874-40f1ef317129" alt=""><figcaption></figcaption></figure>

#### Example: Targeting Specific Provider Endpoints

Each provider on Infron may host multiple endpoints for the same model, such as a default endpoint and a specialized "quantizations" endpoint. To target a specific endpoint, you can use the copy button next to the provider name on the model detail page to obtain the exact provider slug.

For example, MiniMax offers MiniMax M2.1 through multiple endpoints:

* Default endpoint with slug `minimax/fp8`
* Lightning endpoint with slug `minimax/lightning`

By copying the exact provider slug and using it in your request's `order` array, you can ensure your request is routed to the specific endpoint you want:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'minimax/minimax-m2.1',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['minimax/fp8'],
    'allow_fallbacks': False,
  },
})
```

{% endtab %}
{% endtabs %}

This approach is especially useful when you want to consistently use a specific variant of a model from a particular provider.

### Provider Sorting (sort)

If you instead want to *explicitly* prioritize a particular provider attribute, you can include the `sort` field in the `provider` preferences. Default strategy will be disabled, and the router will try providers in order.

The three sort options are:

* `"price"`: prioritize lowest price
* `"throughput"`: prioritize highest throughput
* `"latency"`: prioritize lowest latency

{% tabs %}
{% tab title="prioritize lowest price" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
  },
})
```

{% endtab %}

{% tab title="prioritize highest throughput" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'throughput',
  },
})
```

{% endtab %}

{% tab title="prioritize lowest latency" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'latency',
  },
})
```

{% endtab %}
{% endtabs %}

* To *always* prioritize low prices, set `sort` to `"price"`.

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FDOOaF4P7ZpbM3Tl0o0gJ%2Fimage.png?alt=media&#x26;token=24bc06a3-1796-4c4d-8567-925852ef7c0b" alt=""><figcaption></figcaption></figure>

* To *always* prioritize highest throughput, set `sort` to `"throughput"`.

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FA3LNfW9iAOkvMFRvRtP4%2Fimage.png?alt=media&#x26;token=a2bb22be-50db-4874-bfdc-e6848c0b49f3" alt=""><figcaption></figcaption></figure>

* To *always* prioritize low latency, set `sort` to `"latency"`.

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FYQRnF6Ed9wjkt4miRQhK%2Fimage.png?alt=media&#x26;token=c06d6dda-51a2-43b4-8cd3-1461e1849419" alt=""><figcaption></figcaption></figure>

### Performance Thresholds (preferred\_min\_throughput / preferred\_max\_latency)

You can set `minimum throughput` or `maximum latency` thresholds to filter endpoints.&#x20;

Endpoints that don't meet these thresholds are deprioritized (moved to the end of the list) rather than excluded entirely.

<table><thead><tr><th width="238.4339599609375">Field</th><th width="165.926513671875">Type</th><th width="78.209716796875">Default</th><th>Description</th></tr></thead><tbody><tr><td><code>preferred_min_throughput</code></td><td>number | object</td><td>-</td><td><p>Preferred minimum throughput in tokens per second. </p><p>Can be </p><ul><li><code>a number (applies to p50)</code></li><li>or an <code>object</code> with <code>percentile cutoffs</code>.</li></ul></td></tr><tr><td><code>preferred_max_latency</code></td><td>number | object</td><td>-</td><td><p>Preferred maximum latency in seconds.</p><p>Can be </p><ul><li><code>a number (applies to p50)</code></li><li>or an <code>object</code> with <code>percentile cutoffs</code>.</li></ul></td></tr></tbody></table>

#### How Percentiles Work

Infron tracks `latency` and `throughput` metrics for each model and provider using `percentile statistics` calculated over a rolling `5-minute window`. The available percentiles are:

* **p50** (median): 50% of requests perform better than this value
* **p75**: 75% of requests perform better than this value
* **p90**: 90% of requests perform better than this value
* **p99**: 99% of requests perform better than this value

Higher percentiles (like p90 or p99) give you more confidence about worst-case performance, while lower percentiles (like p50) reflect typical performance. **For example, if a model and provider has a p90 latency of 2 seconds, that means 90% of requests complete in under 2 seconds**.

#### When to Use Percentile Preferences

Percentile-based routing is useful when you need predictable performance characteristics:

* **Real-time applications**: Use p90 or p99 latency thresholds to ensure consistent response times for user-facing features
* **Batch processing**: Use p50 throughput thresholds when you care more about average performance than worst-case scenarios
* **SLA compliance**: Use multiple percentile cutoffs to ensure providers meet your service level agreements across different performance tiers
* **Cost optimization**: Combine with `sort: "price"` to get the cheapest provider that still meets your performance requirements

#### Example: Find the Cheapest Model Meeting Performance Requirements

Combine `'sort': 'price'` with `performance thresholds` to find the cheapest option that meets your performance requirements. **This is useful when you have a performance floor but want to minimize costs**.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'models': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    },
    'preferred_min_throughput': {
      'p90': 50, # Prefer providers with >50 tokens/sec for 90% of requests in last 5 minutes
    },
  },
})
```

{% endtab %}

{% tab title="Curl" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "models": "deepseek/deepseek-v3.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "sort": "price",
      "preferred_min_throughput": {
        "p90": 50
      }
    }
  }'
```

{% endtab %}
{% endtabs %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FsqxI4xNdKK11KAlOiKpL%2Fimage.png?alt=media&#x26;token=95c8de83-bc59-4c79-ae1c-965e73f05ac5" alt=""><figcaption></figcaption></figure>

In this example, Infron will find the `cheapest provider` that `has at least 50 tokens/second throughput at the p90 level` (meaning 90% of requests achieve this throughput or better). Providers below this threshold are still available as fallbacks if all preferred options fail.

You can also use `preferred_max_latency` to set a `maximum acceptable latency`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'models': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    'preferred_max_latency': {
      'p90': 10, # Prefer providers with <10 second latency for 90% of requests in last 5 minutes
    },
  },
})
```

{% endtab %}

{% tab title="Curl" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "models": "deepseek/deepseek-v3.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "sort": "price",
      "preferred_max_latency": {
        "p90": 3
      }
    }
  }'
```

{% endtab %}
{% endtabs %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2F6rrLgWRL5DVlnYTHaOL0%2Fimg_v3_02v5_fc570d19-a4b4-4a50-8664-3b7bfc66dfcg.jpg?alt=media&#x26;token=fa51ef15-0414-4a79-81d6-7e101b7da1a4" alt=""><figcaption></figcaption></figure>

#### Example: Using Multiple Percentile Cutoffs

You can specify multiple percentile cutoffs to set both typical and worst-case performance requirements. All specified cutoffs must be met for a provider to be in the preferred group.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'preferred_max_latency': {
      'p50': 1, # Prefer providers with <1 second latency for 50% of requests in last 5 minutes
      'p90': 3, # Prefer providers with <3 second latency for 90% of requests in last 5 minutes
      'p99': 5, # Prefer providers with <5 second latency for 99% of requests in last 5 minutes
    },
    'preferred_min_throughput': {
      'p50': 100, # Prefer providers with >100 tokens/sec for 50% of requests in last 5 minutes
      'p90': 50, # Prefer providers with >50 tokens/sec for 90% of requests in last 5 minutes
    },
  },
})
```

{% endtab %}

{% tab title="Curl" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Authorization: Bearer <API_KEY>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek/deepseek-v3.2",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "preferred_max_latency": {
        "p50": 1,
        "p90": 3,
        "p99": 5
      },
      "preferred_min_throughput": {
        "p50": 100,
        "p90": 50
      }
    }
  }'
```

{% endtab %}
{% endtabs %}

### Requiring Providers to Support All Parameters (require\_parameters)

You can restrict requests only to providers that support all parameters in your request using the `require_parameters` field.

When you send a request with \``tools`\` or \``tool_choice`\`, Infron will only route to providers that support tool use. Similarly, if you set a \``max_tokens`\`, then Infron will only route to providers that support a response of that length.

| Field                | Type    | Default | Description                                                     |
| -------------------- | ------- | ------- | --------------------------------------------------------------- |
| `require_parameters` | boolean | `true`  | Only use providers that support all parameters in your request. |

* With the default routing strategy (set `require_parameters` to `true`), providers that don't support all the LLM parameters specified in your request can still receive the request, but will ignore unknown parameters.&#x20;
* When you set `require_parameters` to `false`, the request won't even be routed to that provider.

#### Example: Excluding providers that don't support JSON formatting

For example, to only use providers that support JSON formatting:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'require_parameters': True,
  },
  'response_format': { 'type': 'json_object' },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "require_parameters": true,
      "response_format": { "type": "json_object" }  
    }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FPYRU4K4v19DAZaEZhfaT%2Fimage.png?alt=media&#x26;token=824da236-e166-4b98-987f-e380b867770c" alt=""><figcaption></figcaption></figure>

### Requiring Providers to Comply with Data Policies (data\_collection)

You can restrict requests only to providers that comply with your data policies using the `data_collection` field.

| Field             | Type              | Default | Description                                           |
| ----------------- | ----------------- | ------- | ----------------------------------------------------- |
| `data_collection` | "allow" \| "deny" | "allow" | Control whether to use providers that may store data. |

* `allow`: (default) allow providers which store user data non-transiently and may train on it
* `deny`: use only providers which do not collect user data

Some model providers may log prompts, so we display them with a **Data Policy** tag on model pages. This is not a definitive source of third party data policies, but represents our best knowledge.

#### Example: Excluding providers that don't comply with data policies

To exclude providers that don't comply with your data policies, set `data_collection` to `deny`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2'
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'data_collection': 'deny', # or "allow"
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "data_collection": "deny" 
    }
}'
```

{% endtab %}
{% endtabs %}

### Zero Data Retention Enforcement (zdr)

You can enforce Zero Data Retention (ZDR) on a per-request basis using the `zdr` parameter, ensuring your request only routes to endpoints that do not retain prompts.

| Field | Type    | Default | Description                                                   |
| ----- | ------- | ------- | ------------------------------------------------------------- |
| `zdr` | boolean | `false` | Restrict routing to only ZDR (Zero Data Retention) endpoints. |

* When `zdr` is set to `true`, the request will only be routed to endpoints that have a Zero Data Retention policy.&#x20;
* When `zdr` is `false` or not provided, it has no effect on routing.

#### Example: Enforcing ZDR for a specific request

To ensure a request only uses ZDR endpoints, set `zdr` to `true`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json',
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'zdr': True,
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "zdr": true 
    }
}'
```

{% endtab %}
{% endtabs %}

This is useful for customers who don't want to globally enforce ZDR but need to ensure specific requests only route to ZDR endpoints.

### Distillable Text Enforcement (enforce\_distillable\_text)

You can enforce distillable text filtering on a per-request basis using the `enforce_distillable_text` parameter, ensuring your request only routes to models where the author has allowed text distillation.

<table><thead><tr><th width="241.335205078125">Field</th><th width="122.5443115234375">Type</th><th width="109.6444091796875">Default</th><th>Description</th></tr></thead><tbody><tr><td><code>enforce_distillable_text</code></td><td>boolean</td><td><code>false</code></td><td>Restrict routing to only models that allow text distillation.</td></tr></tbody></table>

* When `enforce_distillable_text` is set to `true`, the request will only be routed to models where the author has explicitly enabled text distillation.&#x20;
* When `enforce_distillable_text` is `false` or not provided, it has no effect on routing.

This parameter is useful for applications that need to ensure their requests only use models that allow text distillation for training purposes, such as when building datasets for model fine-tuning or distillation workflows.

#### Example: Enforcing distillable text for a specific request&#x20;

To ensure a request only uses models that allow text distillation, set `enforce_distillable_text` to `true`:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'enforce_distillable_text': True,
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "enforce_distillable_text": true 
    }
}'
```

{% endtab %}
{% endtabs %}

### Disabling Fallbacks (allow\_fallbacks)

#### Example: Always choose the cheapest provider with fallbacks disabled

To guarantee that your request is only served by the lowest-cost provider, you can `disable fallbacks`.

This is combined with the `order` field to restrict the providers that Infron will prioritize to just your chosen list.

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'sort': 'price',
    'allow_fallbacks': False,
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "sort": "price",
      "allow_fallbacks": false 
    }
}'
```

{% endtab %}
{% endtabs %}

#### Example: Always choose the specific providers with fallbacks disabled

Here's an example with `allow_fallbacks` set to `false`,your request will first be sent to Google AI Studio, and then fails if Google AI Studio fails

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'google/gemini-3-flash-preview',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'order': ['google-ai-studio'],
    'allow_fallbacks': False
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "google/gemini-3-flash-preview",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "order": ["google-ai-studio"],
      "allow_fallbacks": false 
    }
}'
```

{% endtab %}
{% endtabs %}

### Allowing Only Specific Providers (only)

You can allow only specific providers for a request by setting the `only` field in the `provider` object.

| Field  | Type      | Default | Description                                       |
| ------ | --------- | ------- | ------------------------------------------------- |
| `only` | string\[] | -       | List of provider slugs to allow for this request. |

Only allowing some providers may significantly reduce fallback options and limit request recovery.&#x20;

#### Example: Only allow Azure for a request calling GPT-4 Omni

Here's an example that will only use Azure for a request calling GPT-4 Omni:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'openai/gpt-5-mini',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'only': ['azure'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "openai/gpt-5-mini",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "only": ["azure"]
    }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2F5SBM408TmPfYWus9q1uq%2Fimage.png?alt=media&#x26;token=bb3c9618-d344-496b-a82f-cebea0ace2a6" alt=""><figcaption></figcaption></figure>

### Ignoring Providers (ignore)

You can ignore providers for a request by setting the `ignore` field in the `provider` object.

| Field    | Type      | Default | Description                                      |
| -------- | --------- | ------- | ------------------------------------------------ |
| `ignore` | string\[] | -       | List of provider slugs to skip for this request. |

Ignoring multiple providers may significantly reduce fallback options and limit request recovery.

#### Example: Ignoring some provider for a request

Here's an example that will ignore some provider:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'ignore': ['deepinfra'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "ignore": ["deepinfra"]
    }
}'
```

{% endtab %}
{% endtabs %}

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FMCXjkvHVhGa65feXfjTI%2Fimage.png?alt=media&#x26;token=734f53f2-b732-45d2-b984-9d81fa675878" alt=""><figcaption></figcaption></figure>

<figure><img src="https://3822312837-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FZ9C9AjT7j46HAcQrOVWw%2Fuploads%2FFvxqhLtrkHnpGjTWvVbv%2Fimage.png?alt=media&#x26;token=61a64a49-c1c8-4ab6-826b-6739d71f2e76" alt=""><figcaption></figcaption></figure>

### Quantization (quantizations)

Quantization reduces model size and computational requirements while aiming to preserve performance. Most LLMs today use FP16 or BF16 for training and inference, cutting memory requirements in half compared to FP32. Some optimizations use FP8 or quantization to reduce size further (e.g., INT8, INT4).

| Field           | Type      | Default | Description                                                         |
| --------------- | --------- | ------- | ------------------------------------------------------------------- |
| `quantizations` | string\[] | -       | List of quantization levels to filter by (e.g. `["int4", "int8"]`). |

Quantized models may exhibit degraded performance for certain prompts, depending on the method used.

Providers can support various quantization levels for open-weight models.

#### Quantization Levels

To filter providers by quantization level, specify the `quantizations` field in the `provider` parameter with the following values:

* `fp16`: Floating point (16 bit)
* `fp8`: Floating point (8 bit)
* `int8`: Integer (8 bit)
* `int4`: Integer (4 bit)
* `none`: Unknown

#### Example: Requesting FP8 Quantization

Here's an example that will only use providers that support FP8 quantization:

{% tabs %}
{% tab title="Python" %}

```python
import requests

headers = {
  'Authorization': 'Bearer <API_KEY>',
  'Content-Type': 'application/json'
}

response = requests.post('https://llm.onerouter.pro/v1/chat/completions', headers=headers, json={
  'model': 'deepseek/deepseek-v3.2',
  'messages': [{ 'role': 'user', 'content': 'Hello' }],
  'provider': {
    'quantizations': ['fp8'],
  },
})
```

{% endtab %}

{% tab title="cURL" %}

```bash
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <API_KEY>" \
  -d '{
  "model": "deepseek/deepseek-v3.2",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "provider": {
      "quantizations": ["fp8"]
    }
}'
```

{% endtab %}
{% endtabs %}
