Master Your AI Spend: A Guide to Infron Usage Accounting
Infron Usage Accounting
By Andrew Zheng •
Infron Usage Accounting


Dec 12, 2025
Andrew Zheng
The Infron API provides built-in Usage Accounting that allows you to track AI model usage without making additional API calls. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.
When enabled, the API will return detailed usage information including:
Prompt and completion token counts using the model's native tokenizer
Cost in credits
Reasoning token counts (if applicable)
Cached token counts (if available)
This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.
You can enable usage accounting in your requests by including the usage parameter:
{ "model": "your-model", "usage": { "include": true } }
When usage accounting is enabled, the response will include a usage object with detailed token information and a cost item and a cost_details object with detailed costs:
{ "id": "c4942c8a-39d8-d39e-7eb0-395c4e4dbf68", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "**Paris** is the capital of France. It's the largest city in the country, serving as the political, cultural, and economic center, with a population of about 2.1 million in the city proper and over 12 million in the greater metropolitan area. This has been the case since the 10th century, when Hugh Capet established it as the seat of the Capetian dynasty.", "refusal": null, "role": "assistant", "annotations": null, "audio": null, "function_call": null, "tool_calls": null } } ], "created": 1763949831, "model": "grok-4-1-fast-non-reasoning", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_80e0751284", "usage": { "completion_tokens": 80, "prompt_tokens": 175, "total_tokens": 255, "completion_tokens_details": { "accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0 }, "prompt_tokens_details": { "audio_tokens": 0, "cached_tokens": 161, "image_tokens": 0, "text_tokens": 175 }, "num_sources_used": 0 }, "cost": 0.000051, "cost_details": { "audio_cost": 0, "cache_prompt_cost": 8.05e-6, "cache_write_cost": 0, "generation_cost": 0, "image_cost": 0, "input_prompt_cost": 2.8e-6, "output_prompt_cost": 0.00004, "tools_cost": 0, "video_cost": 0 }, "request_id": "e7d2ff652d84410f903aef33d7f6471e" }
cost is the total amount charged to your account.
cost_details is the breakdown of the total cost.
Enabling usage accounting will add a few hundred milliseconds to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.
Efficiency: Get usage information without making separate API calls
Accuracy: Token counts are calculated using the model's native tokenizer
Transparency: Track costs and cached token usage in real-time
Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens
Enable usage tracking when you need to monitor token consumption or costs
Account for the slight delay in the final response when usage accounting is enabled
Consider implementing usage tracking in development to optimize token usage before production
Use the cached token information to optimize your application's performance
from openai import OpenAI client = OpenAI( base_url="https://llm.onerouter.pro/v1", api_key="{{API_KEY_REF}}", ) response = client.chat.completions.create( model="{{MODEL}}", messages=[ {"role": "user", "content": "What is the capital of France?"} ], extra_body={ "usage": { "include": True } } ) print("Response:", response.choices[0].message.content) print("Usage Stats:", getattr(response, "usage", None))
According to the OpenAI specification, to request token usage information in a streaming response, you would include the following parameters in your request:
{ "model": "gemini-2.5-flash", "messages": [ { "role": "user", "content": "hi" } ], "stream": true, "stream_options": { "include_usage": true } }
This configuration tells the API to:
Use the Gemini 2.5 Flash model
Stream the response incrementally
Include token usage statistics in the stream response
The stream_options.include_usage parameter specifically requests that token usage information be returned as part of the streaming response.
Now that you've mastered the implementation of Usage Accounting, take a step back to understand the strategic value and engineering capability behind this feature. We recommend these deep-dive articles:
The Future of AI API Cost Management: Discover how usage transparency drives better business decisions.
Real-Time Cost Tracking: The Technical Foundation: A look at the infrastructure powering Infron AI's accounting engine.
The Infron API provides built-in Usage Accounting that allows you to track AI model usage without making additional API calls. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.
When enabled, the API will return detailed usage information including:
Prompt and completion token counts using the model's native tokenizer
Cost in credits
Reasoning token counts (if applicable)
Cached token counts (if available)
This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.
You can enable usage accounting in your requests by including the usage parameter:
{ "model": "your-model", "usage": { "include": true } }
When usage accounting is enabled, the response will include a usage object with detailed token information and a cost item and a cost_details object with detailed costs:
{ "id": "c4942c8a-39d8-d39e-7eb0-395c4e4dbf68", "choices": [ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "**Paris** is the capital of France. It's the largest city in the country, serving as the political, cultural, and economic center, with a population of about 2.1 million in the city proper and over 12 million in the greater metropolitan area. This has been the case since the 10th century, when Hugh Capet established it as the seat of the Capetian dynasty.", "refusal": null, "role": "assistant", "annotations": null, "audio": null, "function_call": null, "tool_calls": null } } ], "created": 1763949831, "model": "grok-4-1-fast-non-reasoning", "object": "chat.completion", "service_tier": null, "system_fingerprint": "fp_80e0751284", "usage": { "completion_tokens": 80, "prompt_tokens": 175, "total_tokens": 255, "completion_tokens_details": { "accepted_prediction_tokens": 0, "audio_tokens": 0, "reasoning_tokens": 0, "rejected_prediction_tokens": 0 }, "prompt_tokens_details": { "audio_tokens": 0, "cached_tokens": 161, "image_tokens": 0, "text_tokens": 175 }, "num_sources_used": 0 }, "cost": 0.000051, "cost_details": { "audio_cost": 0, "cache_prompt_cost": 8.05e-6, "cache_write_cost": 0, "generation_cost": 0, "image_cost": 0, "input_prompt_cost": 2.8e-6, "output_prompt_cost": 0.00004, "tools_cost": 0, "video_cost": 0 }, "request_id": "e7d2ff652d84410f903aef33d7f6471e" }
cost is the total amount charged to your account.
cost_details is the breakdown of the total cost.
Enabling usage accounting will add a few hundred milliseconds to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.
Efficiency: Get usage information without making separate API calls
Accuracy: Token counts are calculated using the model's native tokenizer
Transparency: Track costs and cached token usage in real-time
Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens
Enable usage tracking when you need to monitor token consumption or costs
Account for the slight delay in the final response when usage accounting is enabled
Consider implementing usage tracking in development to optimize token usage before production
Use the cached token information to optimize your application's performance
from openai import OpenAI client = OpenAI( base_url="https://llm.onerouter.pro/v1", api_key="{{API_KEY_REF}}", ) response = client.chat.completions.create( model="{{MODEL}}", messages=[ {"role": "user", "content": "What is the capital of France?"} ], extra_body={ "usage": { "include": True } } ) print("Response:", response.choices[0].message.content) print("Usage Stats:", getattr(response, "usage", None))
According to the OpenAI specification, to request token usage information in a streaming response, you would include the following parameters in your request:
{ "model": "gemini-2.5-flash", "messages": [ { "role": "user", "content": "hi" } ], "stream": true, "stream_options": { "include_usage": true } }
This configuration tells the API to:
Use the Gemini 2.5 Flash model
Stream the response incrementally
Include token usage statistics in the stream response
The stream_options.include_usage parameter specifically requests that token usage information be returned as part of the streaming response.
Now that you've mastered the implementation of Usage Accounting, take a step back to understand the strategic value and engineering capability behind this feature. We recommend these deep-dive articles:
The Future of AI API Cost Management: Discover how usage transparency drives better business decisions.
Real-Time Cost Tracking: The Technical Foundation: A look at the infrastructure powering Infron AI's accounting engine.
Infron Usage Accounting
By Andrew Zheng •

LLM gateways

LLM gateways

Infron's Enterprise-Grade AI Security Architecture

Infron's Enterprise-Grade AI Security Architecture

A Technical Roadmap for R&D Teams

A Technical Roadmap for R&D Teams
Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.

Seamlessly integrate Infron with just a few lines of code and unlock unlimited AI power.
