questionFAQ

Common questions about Infron AI.

Getting started

chevron-rightWhy should I use Infron AI?hashtag

Infron AI provides a unified API to access all the major LLM models on the market. It also allows users to aggregate their billing in one place and keep track of all of their usage using our analytics.

Infron AI passes through the pricing of the underlying providers, while pooling their uptime, so you get the same pricing you’d get from the provider directly, with a unified API and fallbacks so that you get much better uptime.

chevron-rightWhat makes Infron AI unique?hashtag

Infron AI stands out as a unified routing layer that connects multiple AI model providers through a single, consistent API. Instead of integrating separately with different LLM or embedding services, developers can use Infron AI to simplify model management, request routing, and version control.

Infron AI offers flexible configuration options—such as automatic provider selection, fallback routing, and performance optimization—which help ensure reliability and cost-efficiency. In short, Infron AI makes it easier to build and scale AI applications by abstracting away provider complexity while maintaining full transparency and control.

chevron-rightWhat's the story behind Infron AI?hashtag

Infron AI was created to solve a growing pain in the AI development world: managing multiple model providers efficiently. As the ecosystem of large language models and embeddings expanded, developers often found themselves juggling different APIs, authentication methods, and data formats for each provider. This added unnecessary friction and slowed down innovation.

Seeing this challenge, the creators of Infron AI envisioned a single, unified routing layer that could abstract away these complexities—allowing developers to focus on what matters most: building great products powered by AI. The idea was to give teams the flexibility to mix and match providers, experiment seamlessly, and improve reliability through smart routing and fallbacks.

From that vision, Infron AI emerged as an infrastructure solution designed to make multi‑provider AI development as simple, scalable, and transparent as possible. It reflects the broader effort to move from fragmented model integrations toward a cohesive, provider‑agnostic AI ecosystem.

chevron-rightWhy should a person choose Infron AI over its competitors?hashtag

Infron AI offers a flexible and developer‑friendly way to manage multiple AI model providers through one unified API. Unlike tools that tie you to a single vendor, Infron AI lets you easily switch or combine models from different sources without changing your application code.

Infron AI provides built‑in routing logic, fallback mechanisms, and usage tracking so you can optimize cost, latency, and reliability automatically. In addition, its configuration‑based approach and detailed observability tools simplify scaling and debugging. In short, Infron AI helps teams focus on building AI‑powered features rather than maintaining complex provider integrations.

chevron-rightWho are the primary audience of Infron AI?hashtag

The primary audience of Infron AI includes developers, product teams, and organizations building applications that rely on AI models or large language models (LLMs).

Infron AI is designed for engineers who need to integrate, manage, and optimize access to multiple AI providers without maintaining separate APIs. Startups, enterprise AI teams, and platform builders can all benefit from its unified routing system—especially those seeking flexibility, scalability, and cost control in multi‑provider environments. In essence, Infron AI serves anyone who wants to simplify AI infrastructure while maintaining high performance and reliability.

chevron-rightHow do I get started with Infron AI?hashtag

To get started, create an account and add credits on the Creditsarrow-up-right page. Credits are simply deposits on Infron AI that you use for LLM inference. When you use the API or chat interface, we deduct the request cost from your credits. Each model and provider has a different price per million tokens.

Once you have credits you can create API keys and start using the API. You can read our quickstart guide for code samples and more.

chevron-rightHow do I get support?hashtag

The best way to get support is to submit an issuearrow-up-right.

chevron-rightHow do I get billed for my usage on Infron AI?hashtag

For each model we have the pricing displayed per million tokens. There is usually a different price for prompt and completion tokens. There are also models that charge per request, for images and for reasoning tokens. All of these details will be visible on the Logs tabarrow-up-right.

When you make a request to Infron AI, we receive the total number of tokens processed by the provider. We then calculate the corresponding cost and deduct it from your credits. You can review your complete usage history in the Activities tabarrow-up-right.

You can also add the usage: {include: true} parameter to your chat request to get the usage information in the response.

We offer different discounts ranging from 20% to 80% based on the pricing of underlying providers.

Pricing

chevron-rightWhat are the prices for using Infron AI?hashtag

Infron AI charges a $0.35 + 5% fee when you purchase credits. We pass through the pricing of the underlying model providers without any markup, so you pay the same rate as you would directly with the provider.

For more details on our model price, please see our Models tabarrow-up-right.

For more details about every request cost, please see our Logs tabarrow-up-right.

chevron-rightHow is the billing calculated when Prompt Cache is enabled?hashtag

Regardless of whether the cached result is used or a new prompt is processed, billing will follow the Prompt Cache rate as defined in our pricing documentation. This applies to every request, since Prompt Cache is always active in Infron AI.

Models and Providers

chevron-rightWhat LLM models does Infron AI support?hashtag

Infron AI provides access to a wide variety of LLM models, including frontier models from major AI labs.

For a complete list of models you can visit the Models tabarrow-up-right or fetch the list through the models apiarrow-up-right.

chevron-rightHow frequently are new models added?hashtag

We work on adding models as quickly as we can. We often have partnerships with the labs releasing models and can release models as soon as they are available.

If there is a model missing that you’d like Infron AI to support, feel free to message us on issuearrow-up-right.

chevron-rightI am an inference provider, how can I get listed on Infron AI?hashtag

If you would like to contact us, the best place to reach us is over email.

chevron-rightHow does model fallback work if a provider is unavailable?hashtag

If a provider returns an error Infron AI will automatically fall back to the next provider. This happens transparently to the user and allows production apps to be much more resilient.

API Technical Specifications

chevron-rightWhat authentication methods are supported?hashtag

Infron AI uses three authentication methods:

  1. API keys (passed as Bearer tokens) for accessing the completions API and other core endpoints

chevron-rightWhat API endpoints are available?hashtag

Infron AI implements the OpenAI API specification for /completions and /chat/completions endpoints, allowing you to use any model with the same request/response format.

Additional endpoints like /api/v1/models are also available. See our API documentation for detailed specifications.

chevron-rightWhich are the primary technologies used for building Infron AI?hashtag

Infron AI is typically built using modern, cloud‑native web technologies optimized for performance, scalability, and integration with AI services. At its core, Infron AI relies on:

  1. TypeScript and Node.js – for the main API logic, routing, and configuration management. These enable a robust developer experience and compatibility with diverse model providers.

  2. Cloud infrastructure (e.g., AWS, GCP, or similar) – to support distributed routing, load balancing, and secure service deployment across regions.

  3. Database and caching systems – often using PostgreSQL or similar for persistent data, and Redis or in‑memory stores for high‑speed routing decisions.

  4. API and network layer technologies – including REST and WebSocket interfaces, authentication systems, and observability tooling to track provider usage and latency.

  5. Integration SDKs and AI provider APIs – connectors built for leading LLM and AI platforms (such as OpenAI, Anthropic, Google, etc.) to enable seamless model switching.

Together, these technologies provide a flexible foundation that allows Infron AI to route, monitor, and optimize traffic across multiple AI services effectively.

chevron-rightWhat are the supported formats?hashtag

The API supports text and images. Images can be passed as URLs or base64 encoded images. PDF and other file types are coming soon.

chevron-rightHow does streaming work?hashtag

Streaming uses server-sent events (SSE) for real-time token delivery.

Set stream: true in your request to enable streaming responses.

chevron-rightWhat SDK support is available?hashtag

Infron AI is a drop-in replacement for OpenAI. Therefore, any SDKs that support OpenAI by default also support Infron AI. Check out our docs for more details.

chevron-rightCan I mix different modalities in one request?hashtag

Yes! You can send text, images, PDFs, and audio in the same request. The model will process all inputs together.

chevron-rightDoes Infron AI use Prompt Cache by default?hashtag

Yes. Prompt Cache is enabled by default in Infron AI for all API calls. This means that whenever you send a request, Infron AI will attempt to use the cached prompt/response if applicable.

chevron-rightWill using Prompt Cache change my token usage or latency?hashtag
  • Token usage: When a cached response is served, actual model inference may be skipped, which can reduce token consumption.

  • Latency: Cached responses are generally faster to return compared to generating new responses from the model.

  • Billing: The cost per request is based on the Prompt Cache price tier, regardless of cache hit or miss

chevron-rightCan I disable Prompt Cache?hashtag

At this time, Prompt Cache is permanently enabled in Infron AI and cannot be turned off. The design ensures consistent performance optimization and uniform billing.

Privacy and Data Logging

Please see our Terms of Servicearrow-up-right and Privacy Policyarrow-up-right.

chevron-rightWhat data is logged during API use?hashtag

We log basic request metadata (timestamps, model used, token counts). Prompt and completion are not logged by default. We do zero logging of your prompts/completions, even if an error occurs.

chevron-rightWhat third-party sharing occurs?hashtag

Infron AI is a proxy that sends your requests to the model provider for it to be completed. We work with all providers to, when possible, ensure that prompts and completions are not logged or used for training. Providers that do log, or where we have been unable to confirm their policy, will not be routed to unless the model training toggle is switched on in the privacyarrow-up-right.

Credit and Billing Systems

chevron-rightWhat purchase options exist?hashtag

Infron AI uses a credit system where the base currency is US dollars.

All of the pricing on our site and API is denoted in dollars. Users can top up their balance manually.

chevron-rightDo credits expire?hashtag

Per our termsarrow-up-right, we reserve the right to expire unused credits after one year of purchase.

chevron-rightMy credits haven't showed up in my accounthashtag

If you paid using Stripe, sometimes there is an issue with the Stripe integration and credits can get delayed in showing up on your account. Please allow up to one hour. If your credits still have not appeared after an hour, contact us on email and we will look into it.

chevron-rightHow to monitor credit usage?hashtag

The Activityarrow-up-right page allows users to view their historic usage and filter the usage by model, provider and api key.

We also provide a Logsarrow-up-right page that has live information about the balance and remaining credits for the account.

chevron-rightHow do volume discounts work?hashtag

Infron AI does not currently offer volume discounts, but you can reach out to us over email if you think you have an exceptional use case.

chevron-rightWhat payment methods are accepted?hashtag

We accept all major credit cards, AliPay, PayPal and WechatPay. if there are any payment methods that you would like us to support please reach out on issuearrow-up-right.

Account Management

chevron-rightWhat analytics are available?hashtag

Our activity dashboardarrow-up-right provides real-time usage metrics. If you would like any specific reports or metrics please contact us.

chevron-rightHow can I contact support?hashtag

The best way to reach us is to submit new issuearrow-up-right. and email us.

Input Format Support

chevron-rightWhat about video support?hashtag

Video modality support is coming soon! We’re working on adding video processing capabilities to expand our multimodal offerings.

Last updated