calculatorUsage Accounting

Track AI Model Token Usage

The Infron AI API provides built-in Usage Accounting that allows you to track AI model usage without making additional API calls. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.

Usage Information

When enabled, the API will return detailed usage information including:

  1. Prompt and completion token counts using the model's native tokenizer

  2. Cost in credits

  3. Reasoning token counts (if applicable)

  4. Cached token counts (if available)

This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.

Enabling Usage Accounting

You can enable usage accounting in your requests by including the usage parameter:

import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer <<API Keys>>",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "google-ai-studio/gemini-2.5-flash-preview-09-2025", 
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ],
    "usage": {
      "include": True
    }
  })
)
print(response.json())

Response Format

When usage accounting is enabled, the response will include a usage object with detailed token information and a cost item and a cost_details object with detailed costs:

cost is the total amount charged to your account.

cost_details is the breakdown of the total cost.

Enabling usage accounting will add a few hundred milliseconds to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.

Benefits

  1. Efficiency: Get usage information without making separate API calls

  2. Accuracy: Token counts are calculated using the model's native tokenizer

  3. Transparency: Track costs and cached token usage in real-time

  4. Detailed Breakdown: Separate counts for prompt, completion, reasoning, and cached tokens

Best Practices

  1. Enable usage tracking when you need to monitor token consumption or costs

  2. Account for the slight delay in the final response when usage accounting is enabled

  3. Consider implementing usage tracking in development to optimize token usage before production

  4. Use the cached token information to optimize your application's performance

Examples

Basic Usage with Token Tracking

Streaming with Token Tracking

According to the OpenAI specificationarrow-up-right, to request token usage information in a streaming response, you would include the following parameters in your request:

This configuration tells the API to:

  1. ⁠Use the google-ai-studio/gemini-2.5-flash-preview-09-2025

  2. ⁠Stream the response incrementally

  3. Include token usage statistics in the stream response

The ⁠ stream_options.include_usage ⁠ parameter specifically requests that token usage information be returned as part of the streaming response.

The response example is below:

The cost & usage is in the last chat.completion.chunk.

Last updated