Process

Things that happen over time - inference, training, retrieval

37 concepts of this type

Agent Loop
agents

The iterative cycle an agent follows: perceive the current state, reason about what to do, execute an action, observe the result, and repeat until the goal is achieved or a termination condition is me...
Agent Orchestration
agents

The coordination of multiple agents or agent components to accomplish complex tasks. Orchestration determines which agent handles which subtask, manages communication between agents, handles failures ...
Annotation
data

The process of adding labels, tags, or metadata to data to make it suitable for supervised learning or evaluation. For LLMs, annotation includes labeling text for classification, rating response quali...
API Integration
tools

Connecting an LLM application to external services via their APIs, enabling the model to read data from or trigger actions in other systems. API integration is how LLMs access the broader software eco...
Batch Inference
deployment

Processing many inference requests together as a batch rather than one at a time. Batch inference optimizes for throughput and cost rather than latency— it's appropriate when you have many prompts to ...
Caching
deployment

Storing and reusing LLM responses for identical or similar requests to reduce latency and cost. Caching is particularly valuable for LLMs because inference is expensive and deterministic enough that r...
Chain-of-Thought
prompting

A prompting technique that elicits intermediate reasoning steps from an LLM before it produces a final answer. By asking the model to "think step by step" or showing examples with reasoning traces, Ch...
Chunking
retrieval

The process of dividing documents into smaller pieces (chunks) for embedding and retrieval. Chunking is necessary because: (1) embedding models have input limits, (2) smaller chunks enable more precis...
Context Engineering
prompting

The holistic practice of designing and managing everything that goes into an LLM's context window: system prompts, retrieved documents, conversation history, tool definitions, examples, and user input...
Distillation
models

Training a smaller "student" model to mimic the behavior of a larger "teacher" model, transferring the teacher's knowledge into a more compact form. The student learns from the teacher's outputs (soft...
Edge Deployment
deployment

Running AI models on devices close to the user—phones, laptops, edge servers— rather than in centralized cloud data centers. Edge deployment reduces latency, enables offline operation, and keeps data ...
Few-Shot Prompting
prompting

A prompting technique where examples of the desired input-output behavior are included in the prompt to guide the model's response. Instead of just describing what you want, you show the model example...
Fine-Tuning
models

The process of further training a pre-trained model on a specific dataset to adapt it for a particular task or domain. Fine-tuning updates model weights (unlike prompting, which only changes inputs), ...
Grounding
evaluation

Constraining LLM outputs to be based on and traceable to specific source material, rather than generated from the model's parametric knowledge alone. Grounding connects generated text to verifiable so...
Human-in-the-Loop
agents

A system design pattern where human judgment is required at defined points in an automated process. In agent systems, human-in-the-loop typically means: (1) approval before high-stakes actions, (2) re...
Hybrid Search
retrieval

A retrieval approach that combines multiple search strategies—typically keyword-based (sparse/BM25) and embedding-based (dense/vector)—to get the benefits of both. Keyword search excels at exact match...
In-Context Learning
prompting

The ability of large language models to learn and perform new tasks from examples provided in the prompt, without any parameter updates. The model "learns" the pattern from examples and applies it to ...
Inference
models

Running a trained model on input data to produce output (predictions, generated text, classifications). Inference is the "using" phase of machine learning, as opposed to training. For LLMs, inference ...
Instruction Tuning
models

Fine-tuning a language model on datasets of instructions and responses to improve its ability to follow natural language instructions. Instruction- tuned models understand and execute commands like "S...
Iterative Agent Loop
agents

A pattern for running AI coding agents autonomously by wrapping them in an external loop that restarts execution after each completion or failure until a verifiable completion condition is met. Unli...
Jailbreak
prompting

Techniques to bypass an LLM's safety guardrails and content policies, causing it to generate outputs it was trained or configured to refuse. Jailbreaks target the model itself (its RLHF training and s...
Load Balancing
deployment

Distributing incoming requests across multiple servers or model replicas to optimize resource utilization, maximize throughput, and ensure high availability. For LLM serving, load balancing distribute...
Planning
agents

The process by which an agent formulates a sequence of steps or strategy to achieve a goal before executing actions. Planning separates "what to do" from "doing it," allowing the agent to reason about...
Prompt Injection
prompting

An attack where malicious input is crafted to override or manipulate an LLM's instructions, causing it to ignore its system prompt, reveal hidden information, or perform unintended actions. Prompt inj...
Quantization
models

Reducing the numerical precision of model weights and/or activations (e.g., from 32-bit floats to 8-bit or 4-bit integers) to decrease memory usage and increase inference speed, with minimal quality l...
Rate Limiting
deployment

Restricting the number of requests a client can make to an API within a time window. Rate limiting protects services from abuse, ensures fair resource allocation, and maintains system stability. For L...
Reasoning
agents

The process by which an agent (or LLM) derives conclusions, makes decisions, or solves problems through intermediate steps rather than direct pattern matching. In LLM contexts, reasoning typically man...
Red Teaming
evaluation

Systematically testing AI systems by attempting to make them fail, produce harmful outputs, or behave in unintended ways. Red teams act as adversaries, probing for vulnerabilities through prompt injec...
Reflection
agents

The process by which an agent evaluates its own outputs, reasoning, or behavior and uses that evaluation to improve. Reflection enables self- correction without external feedback—the agent acts as its...
Reranking
retrieval

A second-stage ranking process that takes initial retrieval results and reorders them using a more sophisticated (and expensive) relevance model. Rerankers typically use cross-encoder models that join...
Retrieval-Augmented Generation
retrieval

A pattern that enhances LLM generation by first retrieving relevant documents from an external knowledge source, then including those documents in the prompt as context. RAG addresses LLM limitations:...
RLHF
models

A training technique that aligns LLMs with human preferences by using human feedback to train a reward model, then optimizing the LLM against that reward. RLHF is how raw pre-trained models become hel...
Streaming
deployment

Delivering LLM output incrementally as tokens are generated rather than waiting for the complete response. Streaming improves perceived latency—users see text appearing progressively instead of waitin...
Structured Output
tools

Constraining an LLM's output to conform to a specified structure, typically a JSON Schema. Unlike free-form text generation, structured output guarantees the response matches a defined format—valid JS...
Tool Binding
tools

The process of connecting tool definitions to their implementations and making them available to an LLM or agent. Binding involves: defining the tool schema, implementing the execution logic, register...
Tool Use
tools

The capability of an LLM to invoke external functions, APIs, or services to perform actions or retrieve information beyond text generation. Tool use extends LLMs from knowledge systems to action syste...
Vector Search
retrieval

Finding items similar to a query by comparing their vector representations (embeddings) in a high-dimensional space. Unlike keyword search which matches exact terms, vector search captures semantic si...