Large Language Model

System models published

Also known as: LLM, Language Model, Foundation Model

Definition

A neural network trained on massive text corpora to predict and generate natural language. "Large" refers to both the model size (billions of parameters) and training data (trillions of tokens). LLMs learn statistical patterns of language that enable them to generate coherent text, answer questions, write code, and perform tasks described in natural language. They are the core technology enabling modern AI assistants and agents.

What this is NOT

Not all neural networks (LLMs specifically process language)
Not small models (the 'large' matters for emergent capabilities)
Not traditional NLP systems (rule-based, task-specific)
Not search engines (LLMs generate; search retrieves)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

Models like GPT-4, Claude, Llama, and Gemini that can be prompted with natural language to perform a wide variety of tasks. The practical interface is usually a chat or completion API.

Sources: OpenAI GPT documentation, Anthropic Claude documentation, Meta Llama papers

ml-research

Autoregressive transformer models trained with next-token prediction on internet-scale text data. Characterized by emergent capabilities that appear at scale (in-context learning, reasoning, instruction following).

Sources: GPT-3 paper (Brown et al., 2020), Scaling laws papers, Emergent abilities research

Examples

GPT-4 and GPT-4o (OpenAI)
Claude 3.5 Sonnet (Anthropic)
Llama 3.1 405B (Meta)
Gemini 1.5 Pro (Google)

Counterexamples

Things that might seem like Large Language Model but are not:

BERT (encoder-only, not generative)
A 100M parameter model (not 'large' by current standards)
An image generation model like DALL-E (not language-focused)
Traditional NLP systems with rules and small vocabularies

Relations

requires transformer (Modern LLMs use transformer architecture)
overlapsWith foundation-model (LLMs are a type of foundation model)
requires tokenizer (LLMs process tokens, not raw text)
overlapsWith reasoning (LLMs enable reasoning capabilities)

Implementations

Tools and frameworks that implement this concept:

Anthropic primary
Claude 3 primary
Claude 3.5 primary
Claude 4 primary
Cohere primary
Gemini primary
Google Cloud Platform secondary
Google Gemini primary
GPT-4 primary
GPT-4 Turbo primary
GPT-4o primary
Llama 3 primary
Meta Llama primary
Mistral AI primary
Mistral Large primary
o1 primary
o3 primary
OpenAI primary