API Gateway
Also known as: Gateway, LLM Gateway, AI Gateway
Definition
A server that acts as an intermediary between clients and backend services, handling cross-cutting concerns like authentication, rate limiting, logging, and routing. For LLM applications, API gateways can also provide model routing, fallback logic, cost tracking, and unified interfaces across multiple providers.
What this is NOT
- Not the model itself (gateway is infrastructure)
- Not the serving system (gateway fronts serving)
- Not a load balancer only (gateway does more)
Alternative Interpretations
Different communities use this term differently:
llm-practitioners
A proxy layer that sits between your application and LLM providers, providing features like provider fallback, load balancing, caching, and unified observability. Examples: LiteLLM, Portkey, Helicone.
Sources: LiteLLM documentation, Portkey documentation, AI gateway products
software-engineering
An API management component that handles routing, composition, and protocol translation for backend services. Common in microservices architectures (Kong, AWS API Gateway).
Sources: API gateway pattern documentation, Kong, AWS API Gateway documentation
Examples
- LiteLLM providing unified API across 100+ LLM providers
- Portkey with automatic fallback from GPT-4 to Claude
- Helicone for logging and analytics
- AWS API Gateway fronting a SageMaker endpoint
Counterexamples
Things that might seem like API Gateway but are not:
- Direct API calls to OpenAI (no gateway)
- The LLM model itself
- A simple HTTP proxy without LLM-specific features
Relations
- overlapsWith inference-endpoint (Gateways front inference endpoints)
- overlapsWith rate-limiting (Gateways implement rate limiting)
- overlapsWith model-router (Routing is a gateway function)
- overlapsWith caching (Gateways can implement caching)
Implementations
Tools and frameworks that implement this concept: