Context Window

Property retrieval published

Also known as: Context Length, Max Tokens, Context Size

Definition

The maximum amount of text (measured in tokens) that an LLM can process in a single inference call, including both input (prompt + context) and output (response). The context window is a hard constraint—text beyond it is invisible to the model. Context windows range from 4K tokens (older models) to 200K+ tokens (Claude, Gemini) or effectively unlimited (with memory systems).

What this is NOT

  • Not the same as the model's knowledge (context is input; knowledge is in weights)
  • Not output length alone (context window includes input + output)
  • Not unlimited (even 'long context' models have limits)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

A model specification indicating how much text can be provided in a single API call. Larger context windows enable more retrieved content, longer conversations, and processing of larger documents.

Sources: OpenAI model documentation, Anthropic model documentation, Model context length comparison charts

transformer-research

The sequence length limit determined by positional encoding and attention mechanisms. Self-attention is O(n²) in sequence length, making very long contexts computationally expensive without architectural innovations.

Sources: Attention mechanism literature, Long-context model papers (Longformer, BigBird, etc.)

Examples

  • GPT-4o with 128K token context window
  • Claude 3.5 with 200K token context window
  • Fitting 50 retrieved chunks into a 16K context budget
  • Truncating conversation history to fit context limits

Counterexamples

Things that might seem like Context Window but are not:

  • The model's total parameter count (that's model size, not context)
  • The training data size
  • The output length limit alone

Relations

Implementations

Tools and frameworks that implement this concept: