> For the complete documentation index, see [llms.txt](https://infronai.gitbook.io/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://infronai.gitbook.io/docs/llm-inference-handbook/inference-optimization/offline-batch-inference.md).

# Offline batch inference

Offline batch inference is the process of running models on large, static datasets to generate predictions in batches, rather than one at a time in real-time (online inference). It’s called "offline" because it doesn’t happen interactively; instead, it’s done as a bulk processing job.

By contrast, online inference means that the model only makes predictions on demand, for example, when a client requests a prediction.

Key benefits of offline batch inference:

* Precomputing predictions reduces the load on real-time systems
* More flexibility to use complex models that would be too slow for real-time inference.
* Supports post-processing and validation of predictions before using them in production.

You may want to use offline batch inference in the following cases:

* Your data doesn’t change often, so you don’t need real-time predictions.
* You have a large dataset to process, and the predictions can be stored and reused later.
* Your model is too big or slow for real-time predictions but works fine if run in advance.
* You want to validate or review predictions before serving them to users (e.g., for quality or compliance checks).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://infronai.gitbook.io/docs/llm-inference-handbook/inference-optimization/offline-batch-inference.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.