# Usage Accounting

Infron provides built-in **Usage Accounting** that allows you to **track AI model usage** **and cost breakdowns**. This feature provides detailed information about token counts, costs, and caching status directly in your API responses.

### Benefits

1. **Efficiency**: Get usage information without making separate API calls
2. **Accuracy**: Token counts are calculated using the model's native tokenizer
3. **Transparency**: Track costs and cached token usage in real-time
4. **Detailed Breakdown**: Separate counts for prompt, completion, reasoning, and cached tokens

### Usage Information

When enabled, the API will return detailed usage information including:

1. Prompt and completion token counts using the model's native tokenizer
2. Cost in credits
3. Reasoning token counts (if applicable)
4. Cached token counts (if available)

This information is included in the last SSE message for streaming responses, or in the complete response for non-streaming requests.

### Enabling Usage Accounting

You can `enable usage accounting` in your requests by including the `usage` parameter:

{% tabs %}
{% tab title="Python" %}

```python
import requests
import json

response = requests.post(
  url="https://llm.onerouter.pro/v1/chat/completions",
  headers={
    "Authorization": "Bearer sk-Yf4DNVYYYu8EJOYEk01kqRlHe1n8FEiRgbpaeDtGVvY6h3hi",
    "Content-Type": "application/json"
  },
  data=json.dumps({
    "model": "x-ai/grok-4.1-fast-non-reasoning", 
    "messages": [
      {
        "role": "user",
        "content": "What is the meaning of life?"
      }
    ],
    "usage": {
      "include": True
    }
  })
)
print(response.json())
```

{% endtab %}
{% endtabs %}

**Response Format**

When usage accounting is enabled, the response will include a `usage` object with detailed token information and a `cost` item and a `cost_details` object with detailed costs:

```json
{
  'choices': [{
    'finish_reason': 'stop',
    'index': 0,
    'logprobs': None,
    'message': {
      'content': '42.\n\n### The Hitchhiker\'s Guide Explanation\nIn Douglas Adams\' *The Hitchhiker\'s Guide to the Galaxy*, a supercomputer named Deep Thought calculates the "Answer to the Ultimate Question of Life, the Universe, and Everything" over 7.5 million years, arriving at **42**. The catch? No one knew the actual question. This satirical take highlights the absurdity of seeking a single, universal meaning—life\'s "purpose" might be whatever question you pose to it.\n\n### Philosophical Perspectives\n- **Existentialism** (e.g., Sartre, Camus): There is no inherent meaning; *you* create it through choices, relationships, and rebellion against absurdity. Evidence: Camus\' *The Myth of Sisyphus* argues we must imagine Sisyphus happy, finding purpose in the struggle itself.\n- **Nihilism** (Nietzsche): Life has no objective meaning, but that\'s liberating—overcome it by affirming life (*amor fati*, love of fate). Nietzsche\'s *Thus Spoke Zarathustra* urges becoming an "Übermensch" to self-define value.\n- **Religious Views**: Many traditions posit divine purpose—e.g., Christianity (glorify God, per Westminster Catechism); Buddhism (end suffering via enlightenment, Four Noble Truths); Islam (worship Allah, Quran 51:56).\n- **Biological/Evolutionary**: Richard Dawkins\' *The Selfish Gene* frames life as gene propagation; meaning emerges from survival, reproduction, and cooperation (evidenced by eusocial insects like ants).\n\n### Scientific & Modern Takes\n- **Physics/Cosmology**: Life defies entropy (2nd Law of Thermodynamics), creating local order in a vast, indifferent universe. Carl Sagan: "We are a way for the cosmos to know itself."\n- **Positive Psychology** (Viktor Frankl\'s *Man\'s Search for Meaning*): Meaning from love, work, and attitude toward suffering—backed by logotherapy studies showing purpose reduces depression.\n- **Data-Driven**: Harvard Grant Study (80+ years) finds the strongest predictor of happiness/longevity is quality relationships, not wealth or fame.\n\nUltimately, "meaning" is subjective—pursue what fulfills you, whether curiosity, connection, creation, or just enjoying the ride. What\'s yours?',
      'role': 'assistant'
    }
  }],
  'cost': 0.000245,
  'cost_details': {
    'audio_cost': 0,
    'byok_cost': 0,
    'completion_cost': 0.0002345,
    'discount_rate': 1,
    'image_cost': 0,
    'is_byok': False,
    'native_web_search_cost': 0,
    'plugin_web_search_cost': 0,
    'prompt_cache_read_cost': 8.05e-06,
    'prompt_cache_write_1_h': 0,
    'prompt_cache_write_5_min': 0,
    'prompt_cache_write_cost': 0,
    'prompt_cost': 2.8e-06,
    'reasoning_cost': 0,
    'tools_cost': 0,
    'video_cost': 0
  },
  'created': 1773366496,
  'id': '4128650a-58a7-91d3-8bc4-d55cf8df0026',
  'model': 'x-ai/grok-4.1-fast-non-reasoning',
  'object': 'chat.completion',
  'request_id': '2ba9fb31a8204925a9a3aa16909c2e18',
  'usage': {
    'completion_tokens': 469,
    'completion_tokens_details': {},
    'input_tokens': 0,
    'output_tokens': 0,
    'prompt_tokens': 175,
    'prompt_tokens_details': {
      'cached_tokens': 161,
      'text_tokens': 175
    },
    'total_tokens': 644,
    'ttft': 0
  }
}
```

* `cost` is the total amount charged to your account balance.
* `cost_details` is the breakdown of the total cost.
* `usage` is the breakdown of the tokens structure.

{% hint style="info" %}
Enabling usage accounting will add `100~200ms` to the last response as the API calculates token counts and costs. This only affects the final message and does not impact overall streaming performance.
{% endhint %}

## Examples

#### Basic Usage with Token Tracking

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="{{API_KEY_REF}}",
)

response = client.chat.completions.create(
    model="{{MODEL}}",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "usage": {
            "include": True
        }
    }
)

print("Response:", response.choices[0].message.content)
print("Usage Stats:", getattr(response, "usage", None))
```

{% endtab %}
{% endtabs %}

#### Streaming with Token Tracking

According to the [OpenAI specification](https://platform.openai.com/docs/api-reference/completions/create#completions_create-stream_options), to request token usage information in a streaming response, you would include the following parameters in your request:

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    base_url="https://llm.onerouter.pro/v1",
    api_key="<<API Keys>>",
)

response = client.chat.completions.create(
    model="google-ai-studio/gemini-2.5-flash-preview-09-2025",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    extra_body={
        "usage": {
            "include": True
        },
        "stream": True,
        "stream_options": {
            "include_usage": True
        }
    }
)

print(response)
```

{% endtab %}

{% tab title="cRUL" %}

```shellscript
curl https://llm.onerouter.pro/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <<API Keys>>" \
  -d '{
  "model": "google-ai-studio/gemini-2.5-flash-preview-09-2025",
  "messages": [
    {
      "role": "user",
      "content": "What is the meaning of life?"
    }
  ],
  "usage": {
      "include": True
  }
  "stream": true,
  "stream_options": {
      "include_usage": true
  }
}'
```

{% endtab %}
{% endtabs %}

This configuration tells the API to:&#x20;

1. ⁠Use the `google-ai-studio/gemini-2.5-flash-preview-09-2025`&#x20;
2. ⁠Stream the response incrementally
3. Include token usage statistics in the stream response

The ⁠ `stream_options.include_usage` ⁠ parameter specifically requests that token usage information be returned as part of the streaming response.

The response example is below:

```json
data: {"id":"chatcmpl-20251218030558845163004nROxtMvj","object":"chat.completion.chunk","created":1766027159,"model":"gemini-2.5-flash-preview-09-2025","system_fingerprint":null,"choices":[{"delta":{"content":"The capital of France","role":"assistant"},"logprobs":null,"finish_reason":null,"index":0}],"usage":null}

data: {"id":"chatcmpl-20251218030558845163004nROxtMvj","object":"chat.completion.chunk","created":1766027159,"model":"gemini-2.5-flash-preview-09-2025","system_fingerprint":null,"choices":[{"delta":{"content":" is **Paris**.","role":"assistant"},"logprobs":null,"finish_reason":"stop","index":0}],"usage":null}

data: {"id":"chatcmpl-20251218030558845163004nROxtMvj","object":"chat.completion.chunk","created":1766027159,"model":"gemini-2.5-flash-preview-09-2025","system_fingerprint":null,"request_id":"aca8184689134db7be37e41fc0e91486","choices":[{"delta":{},"logprobs":null,"finish_reason":"stop","index":0}],"usage":null}

data: {"choices":[],"cost":0.000022,"cost_details":{"audio_cost":0,"cache_prompt_cost":0,"cache_write_cost":0,"generation_cost":0,"image_cost":0,"input_prompt_cost":0.0000024,"output_prompt_cost":0.000020002,"tools_cost":0,"video_cost":0},"created":1766027159,"discounted":"1","id":"chatcmpl-20251218030558845163004nROxtMvj","model":"gemini-2.5-flash-preview-09-2025","object":"chat.completion.chunk","request_id":"aca8184689134db7be37e41fc0e91486","system_fingerprint":null,"usage":{"completion_tokens":8,"input_tokens":0,"output_tokens":0,"prompt_tokens":8,"prompt_tokens_details":{"text_tokens":8},"server_tool_use":{"web_search_requests":""},"total_tokens":16,"ttft":377876649}}

data: [DONE]
```

The cost & usage is in the last `chat.completion.chunk`.

```json
{
  "choices": [],
  "cost": 0.000022,
  "cost_details": {
    "audio_cost": 0,
    "cache_prompt_cost": 0,
    "cache_write_cost": 0,
    "generation_cost": 0,
    "image_cost": 0,
    "input_prompt_cost": 0.0000024,
    "output_prompt_cost": 0.000020002,
    "tools_cost": 0,
    "video_cost": 0
  },
  "created": 1766027159,
  "discounted": "1",
  "id": "chatcmpl-20251218030558845163004nROxtMvj",
  "model": "gemini-2.5-flash-preview-09-2025",
  "object": "chat.completion.chunk",
  "request_id": "aca8184689134db7be37e41fc0e91486",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 8,
    "input_tokens": 0,
    "output_tokens": 0,
    "prompt_tokens": 8,
    "prompt_tokens_details": {
      "text_tokens": 8
    },
    "server_tool_use": {
      "web_search_requests": ""
    },
    "total_tokens": 16,
    "ttft": 377876649
  }
}
```

### Best Practices

1. Enable usage tracking when you need to monitor token consumption or costs
2. Account for the slight delay in the final response when usage accounting is enabled
3. Consider implementing usage tracking in development to optimize token usage before production
4. Use the cached token information to optimize your application's performance
