Process
Things that happen over time - inference, training, retrieval
37 concepts of this type
-
Agent Loop
agentsThe iterative cycle an agent follows: perceive the current state, reason about what to do, execute an action, observe the result, and repeat until the goal is achieved or a termination condition is me...
-
Agent Orchestration
agentsThe coordination of multiple agents or agent components to accomplish complex tasks. Orchestration determines which agent handles which subtask, manages communication between agents, handles failures ...
-
Annotation
dataThe process of adding labels, tags, or metadata to data to make it suitable for supervised learning or evaluation. For LLMs, annotation includes labeling text for classification, rating response quali...
-
API Integration
toolsConnecting an LLM application to external services via their APIs, enabling the model to read data from or trigger actions in other systems. API integration is how LLMs access the broader software eco...
-
Batch Inference
deploymentProcessing many inference requests together as a batch rather than one at a time. Batch inference optimizes for throughput and cost rather than latency— it's appropriate when you have many prompts to ...
-
Caching
deploymentStoring and reusing LLM responses for identical or similar requests to reduce latency and cost. Caching is particularly valuable for LLMs because inference is expensive and deterministic enough that r...
-
Chain-of-Thought
promptingA prompting technique that elicits intermediate reasoning steps from an LLM before it produces a final answer. By asking the model to "think step by step" or showing examples with reasoning traces, Ch...
-
Chunking
retrievalThe process of dividing documents into smaller pieces (chunks) for embedding and retrieval. Chunking is necessary because: (1) embedding models have input limits, (2) smaller chunks enable more precis...
-
Context Engineering
promptingThe holistic practice of designing and managing everything that goes into an LLM's context window: system prompts, retrieved documents, conversation history, tool definitions, examples, and user input...
-
Distillation
modelsTraining a smaller "student" model to mimic the behavior of a larger "teacher" model, transferring the teacher's knowledge into a more compact form. The student learns from the teacher's outputs (soft...
-
Edge Deployment
deploymentRunning AI models on devices close to the user—phones, laptops, edge servers— rather than in centralized cloud data centers. Edge deployment reduces latency, enables offline operation, and keeps data ...
-
Few-Shot Prompting
promptingA prompting technique where examples of the desired input-output behavior are included in the prompt to guide the model's response. Instead of just describing what you want, you show the model example...
-
Fine-Tuning
modelsThe process of further training a pre-trained model on a specific dataset to adapt it for a particular task or domain. Fine-tuning updates model weights (unlike prompting, which only changes inputs), ...
-
Grounding
evaluationConstraining LLM outputs to be based on and traceable to specific source material, rather than generated from the model's parametric knowledge alone. Grounding connects generated text to verifiable so...
-
Human-in-the-Loop
agentsA system design pattern where human judgment is required at defined points in an automated process. In agent systems, human-in-the-loop typically means: (1) approval before high-stakes actions, (2) re...
-
Hybrid Search
retrievalA retrieval approach that combines multiple search strategies—typically keyword-based (sparse/BM25) and embedding-based (dense/vector)—to get the benefits of both. Keyword search excels at exact match...
-
In-Context Learning
promptingThe ability of large language models to learn and perform new tasks from examples provided in the prompt, without any parameter updates. The model "learns" the pattern from examples and applies it to ...
-
Inference
modelsRunning a trained model on input data to produce output (predictions, generated text, classifications). Inference is the "using" phase of machine learning, as opposed to training. For LLMs, inference ...
-
Instruction Tuning
modelsFine-tuning a language model on datasets of instructions and responses to improve its ability to follow natural language instructions. Instruction- tuned models understand and execute commands like "S...
-
Iterative Agent Loop
agentsA pattern for running AI coding agents autonomously by wrapping them in an external loop that restarts execution after each completion or failure until a verifiable completion condition is met. Unli...
-
Jailbreak
promptingTechniques to bypass an LLM's safety guardrails and content policies, causing it to generate outputs it was trained or configured to refuse. Jailbreaks target the model itself (its RLHF training and s...
-
Load Balancing
deploymentDistributing incoming requests across multiple servers or model replicas to optimize resource utilization, maximize throughput, and ensure high availability. For LLM serving, load balancing distribute...
-
Planning
agentsThe process by which an agent formulates a sequence of steps or strategy to achieve a goal before executing actions. Planning separates "what to do" from "doing it," allowing the agent to reason about...
-
Prompt Injection
promptingAn attack where malicious input is crafted to override or manipulate an LLM's instructions, causing it to ignore its system prompt, reveal hidden information, or perform unintended actions. Prompt inj...
-
Quantization
modelsReducing the numerical precision of model weights and/or activations (e.g., from 32-bit floats to 8-bit or 4-bit integers) to decrease memory usage and increase inference speed, with minimal quality l...
-
Rate Limiting
deploymentRestricting the number of requests a client can make to an API within a time window. Rate limiting protects services from abuse, ensures fair resource allocation, and maintains system stability. For L...
-
Reasoning
agentsThe process by which an agent (or LLM) derives conclusions, makes decisions, or solves problems through intermediate steps rather than direct pattern matching. In LLM contexts, reasoning typically man...
-
Red Teaming
evaluationSystematically testing AI systems by attempting to make them fail, produce harmful outputs, or behave in unintended ways. Red teams act as adversaries, probing for vulnerabilities through prompt injec...
-
Reflection
agentsThe process by which an agent evaluates its own outputs, reasoning, or behavior and uses that evaluation to improve. Reflection enables self- correction without external feedback—the agent acts as its...
-
Reranking
retrievalA second-stage ranking process that takes initial retrieval results and reorders them using a more sophisticated (and expensive) relevance model. Rerankers typically use cross-encoder models that join...
-
Retrieval-Augmented Generation
retrievalA pattern that enhances LLM generation by first retrieving relevant documents from an external knowledge source, then including those documents in the prompt as context. RAG addresses LLM limitations:...
-
RLHF
modelsA training technique that aligns LLMs with human preferences by using human feedback to train a reward model, then optimizing the LLM against that reward. RLHF is how raw pre-trained models become hel...
-
Streaming
deploymentDelivering LLM output incrementally as tokens are generated rather than waiting for the complete response. Streaming improves perceived latency—users see text appearing progressively instead of waitin...
-
Structured Output
toolsConstraining an LLM's output to conform to a specified structure, typically a JSON Schema. Unlike free-form text generation, structured output guarantees the response matches a defined format—valid JS...
-
Tool Binding
toolsThe process of connecting tool definitions to their implementations and making them available to an LLM or agent. Binding involves: defining the tool schema, implementing the execution logic, register...
-
Tool Use
toolsThe capability of an LLM to invoke external functions, APIs, or services to perform actions or retrieve information beyond text generation. Tool use extends LLMs from knowledge systems to action syste...
-
Vector Search
retrievalFinding items similar to a query by comparing their vector representations (embeddings) in a high-dimensional space. Unlike keyword search which matches exact terms, vector search captures semantic si...