API Gateway

System deployment published

Also known as: Gateway, LLM Gateway, AI Gateway

Definition

A server that acts as an intermediary between clients and backend services, handling cross-cutting concerns like authentication, rate limiting, logging, and routing. For LLM applications, API gateways can also provide model routing, fallback logic, cost tracking, and unified interfaces across multiple providers.

What this is NOT

Not the model itself (gateway is infrastructure)
Not the serving system (gateway fronts serving)
Not a load balancer only (gateway does more)

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

A proxy layer that sits between your application and LLM providers, providing features like provider fallback, load balancing, caching, and unified observability. Examples: LiteLLM, Portkey, Helicone.

Sources: LiteLLM documentation, Portkey documentation, AI gateway products

software-engineering

An API management component that handles routing, composition, and protocol translation for backend services. Common in microservices architectures (Kong, AWS API Gateway).

Sources: API gateway pattern documentation, Kong, AWS API Gateway documentation

Examples

LiteLLM providing unified API across 100+ LLM providers
Portkey with automatic fallback from GPT-4 to Claude
Helicone for logging and analytics
AWS API Gateway fronting a SageMaker endpoint

Counterexamples

Things that might seem like API Gateway but are not:

Direct API calls to OpenAI (no gateway)
The LLM model itself
A simple HTTP proxy without LLM-specific features

Relations

overlapsWith inference-endpoint (Gateways front inference endpoints)
overlapsWith rate-limiting (Gateways implement rate limiting)
overlapsWith model-router (Routing is a gateway function)
overlapsWith caching (Gateways can implement caching)

Implementations

Tools and frameworks that implement this concept:

Helicone primary
LiteLLM primary
Portkey primary