Context Window
Also known as: Context Length, Max Tokens, Context Size
Definition
The maximum amount of text (measured in tokens) that an LLM can process in a single inference call, including both input (prompt + context) and output (response). The context window is a hard constraint—text beyond it is invisible to the model. Context windows range from 4K tokens (older models) to 200K+ tokens (Claude, Gemini) or effectively unlimited (with memory systems).
What this is NOT
- Not the same as the model's knowledge (context is input; knowledge is in weights)
- Not output length alone (context window includes input + output)
- Not unlimited (even 'long context' models have limits)
Alternative Interpretations
Different communities use this term differently:
llm-practitioners
A model specification indicating how much text can be provided in a single API call. Larger context windows enable more retrieved content, longer conversations, and processing of larger documents.
Sources: OpenAI model documentation, Anthropic model documentation, Model context length comparison charts
transformer-research
The sequence length limit determined by positional encoding and attention mechanisms. Self-attention is O(n²) in sequence length, making very long contexts computationally expensive without architectural innovations.
Sources: Attention mechanism literature, Long-context model papers (Longformer, BigBird, etc.)
Examples
- GPT-4o with 128K token context window
- Claude 3.5 with 200K token context window
- Fitting 50 retrieved chunks into a 16K context budget
- Truncating conversation history to fit context limits
Counterexamples
Things that might seem like Context Window but are not:
- The model's total parameter count (that's model size, not context)
- The training data size
- The output length limit alone
Relations
- overlapsWith retrieval-augmented-generation (RAG exists partly to work around context limits)
- overlapsWith agent-memory (Memory systems extend beyond context window)
- overlapsWith chunking (Chunking helps fit content within context)
- inTensionWith long-term-memory (Long-term memory persists beyond context limits)
Implementations
Tools and frameworks that implement this concept:
- Anthropic primary
- Google Gemini primary