# Latency

Infron is engineered with performance as a core priority. The platform is optimized to introduce as little additional latency as possible.

### Base Latency

Under standard production conditions, Infron adds roughly `100 ms` of latency to each request. This minimal overhead is achieved through:

* Edge compute execution using Cloudflare Workers to stay geographically close to your application
* Highly efficient edge caching of user and API key metadata
* Optimized routing logic designed to minimize processing time

### Performance Considerations

#### Cache Warming

When edge caches are cold (typically within the first 5 minutes of receiving traffic in a new region), latency may be slightly higher until the caches are fully populated.

#### Credit Balance Checks

To ensure accurate billing and avoid overages, Infron performs additional database checks when a user's credit balance becomes low (single‑digit dollar amounts). Caches expire more aggressively under these conditions, which can temporarily increase latency until more credits are added.

#### Model Fallback

When using provider routing, if a primary model or provider fails, Infron automatically falls back to the next available option. A failed initial attempt naturally adds latency for that request. Infron monitors provider failures in real time and dynamically routes around unstable providers to avoid repeated performance impacts.

### Best Practices for Optimal Performance

* Maintain a Healthy Credit Balance: A recommended minimum balance of $50–$100 helps ensure smooth operation without increased latency from extra billing checks.
* Use Provider Preferences: If you have specific latency requirements—such as time to first token or time to final token—Infron offers routing controls that let you prioritize providers based on performance and cost considerations.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://infronai.gitbook.io/docs/observability/latency.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
