# LLM inference basics

- [What is LLM inference?](/docs/llm-inference-handbook/llm-inference-basics/quickstart.md)
- [How does LLM inference work?](/docs/llm-inference-handbook/llm-inference-basics/how-does-llm-inference-work.md): During inference, an LLM generates text one token at a time, using its internal attention mechanisms and knowledge of previous context.
- [Where is LLM inference run?](/docs/llm-inference-handbook/llm-inference-basics/where-is-llm-inference-run.md)
- [Training vs. Inference](/docs/llm-inference-handbook/llm-inference-basics/training-vs.-inference.md): LLM training and inference are two different phases in the lifecycle of a model.
- [What is Serverless inference?](/docs/llm-inference-handbook/llm-inference-basics/what-is-serverless-inference.md)
- [What is Server-based inference?](/docs/llm-inference-handbook/llm-inference-basics/what-is-server-based-inference.md)
- [Serverless vs. Self-hosted LLM inference](/docs/llm-inference-handbook/llm-inference-basics/serverless-vs.-self-hosted-llm-inference.md)
- [Serverless vs. Server-based LLM inference](/docs/llm-inference-handbook/llm-inference-basics/serverless-vs.-server-based-llm-inference.md)
- [What is distributed inference?](/docs/llm-inference-handbook/llm-inference-basics/what-is-distributed-inference.md)
