Large Language Model
Also known as: LLM, Language Model, Foundation Model
Definition
A neural network trained on massive text corpora to predict and generate natural language. "Large" refers to both the model size (billions of parameters) and training data (trillions of tokens). LLMs learn statistical patterns of language that enable them to generate coherent text, answer questions, write code, and perform tasks described in natural language. They are the core technology enabling modern AI assistants and agents.
What this is NOT
- Not all neural networks (LLMs specifically process language)
- Not small models (the 'large' matters for emergent capabilities)
- Not traditional NLP systems (rule-based, task-specific)
- Not search engines (LLMs generate; search retrieves)
Alternative Interpretations
Different communities use this term differently:
llm-practitioners
Models like GPT-4, Claude, Llama, and Gemini that can be prompted with natural language to perform a wide variety of tasks. The practical interface is usually a chat or completion API.
Sources: OpenAI GPT documentation, Anthropic Claude documentation, Meta Llama papers
ml-research
Autoregressive transformer models trained with next-token prediction on internet-scale text data. Characterized by emergent capabilities that appear at scale (in-context learning, reasoning, instruction following).
Sources: GPT-3 paper (Brown et al., 2020), Scaling laws papers, Emergent abilities research
Examples
- GPT-4 and GPT-4o (OpenAI)
- Claude 3.5 Sonnet (Anthropic)
- Llama 3.1 405B (Meta)
- Gemini 1.5 Pro (Google)
Counterexamples
Things that might seem like Large Language Model but are not:
- BERT (encoder-only, not generative)
- A 100M parameter model (not 'large' by current standards)
- An image generation model like DALL-E (not language-focused)
- Traditional NLP systems with rules and small vocabularies
Relations
- requires transformer (Modern LLMs use transformer architecture)
- overlapsWith foundation-model (LLMs are a type of foundation model)
- requires tokenizer (LLMs process tokens, not raw text)
- overlapsWith reasoning (LLMs enable reasoning capabilities)
Implementations
Tools and frameworks that implement this concept:
- Anthropic primary
- Claude 3 primary
- Claude 3.5 primary
- Claude 4 primary
- Cohere primary
- Gemini primary
- Google Cloud Platform secondary
- Google Gemini primary
- GPT-4 primary
- GPT-4 Turbo primary
- GPT-4o primary
- Llama 3 primary
- Meta Llama primary
- Mistral AI primary
- Mistral Large primary
- o1 primary
- o3 primary
- OpenAI primary