Context Window

Property retrieval published

Also known as: Context Length, Max Tokens, Context Size

Definition

The maximum amount of text (measured in tokens) that an LLM can process in a single inference call, including both input (prompt + context) and output (response). The context window is a hard constraint—text beyond it is invisible to the model. Context windows range from 4K tokens (older models) to 200K+ tokens (Claude, Gemini) or effectively unlimited (with memory systems).

What this is NOT

Not the same as the model's knowledge (context is input; knowledge is in weights)
Not output length alone (context window includes input + output)
Not unlimited (even 'long context' models have limits)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

A model specification indicating how much text can be provided in a single API call. Larger context windows enable more retrieved content, longer conversations, and processing of larger documents.

Sources: OpenAI model documentation, Anthropic model documentation, Model context length comparison charts

transformer-research

The sequence length limit determined by positional encoding and attention mechanisms. Self-attention is O(n²) in sequence length, making very long contexts computationally expensive without architectural innovations.

Sources: Attention mechanism literature, Long-context model papers (Longformer, BigBird, etc.)

Examples

GPT-4o with 128K token context window
Claude 3.5 with 200K token context window
Fitting 50 retrieved chunks into a 16K context budget
Truncating conversation history to fit context limits

Counterexamples

Things that might seem like Context Window but are not:

The model's total parameter count (that's model size, not context)
The training data size
The output length limit alone

Relations

overlapsWith retrieval-augmented-generation (RAG exists partly to work around context limits)
overlapsWith agent-memory (Memory systems extend beyond context window)
overlapsWith chunking (Chunking helps fit content within context)
inTensionWith long-term-memory (Long-term memory persists beyond context limits)

Implementations

Tools and frameworks that implement this concept:

Anthropic primary
Google Gemini primary