Latency
Understanding Infron's performance characteristics.
Infron is engineered with performance as a core priority. The platform is optimized to introduce as little additional latency as possible.
Base Latency
Under standard production conditions, Infron adds roughly 100 ms of latency to each request. This minimal overhead is achieved through:
Edge compute execution using Cloudflare Workers to stay geographically close to your application
Highly efficient edge caching of user and API key metadata
Optimized routing logic designed to minimize processing time
Performance Considerations
Cache Warming
When edge caches are cold (typically within the first 5 minutes of receiving traffic in a new region), latency may be slightly higher until the caches are fully populated.
Credit Balance Checks
To ensure accurate billing and avoid overages, Infron performs additional database checks when a user's credit balance becomes low (single‑digit dollar amounts). Caches expire more aggressively under these conditions, which can temporarily increase latency until more credits are added.
Model Fallback
When using provider routing, if a primary model or provider fails, Infron automatically falls back to the next available option. A failed initial attempt naturally adds latency for that request. Infron monitors provider failures in real time and dynamically routes around unstable providers to avoid repeated performance impacts.
Best Practices for Optimal Performance
Maintain a Healthy Credit Balance: A recommended minimum balance of $50–$100 helps ensure smooth operation without increased latency from extra billing checks.
Use Provider Preferences: If you have specific latency requirements—such as time to first token or time to final token—Infron offers routing controls that let you prioritize providers based on performance and cost considerations.
Last updated