Model Router

System deployment published

Also known as: LLM Router, Intelligent Routing, Model Selection

Definition

A system that dynamically selects which model to use for a given request based on criteria like cost, latency, capability, or query complexity. Routers enable cost optimization (use cheaper models for simple queries) and capability matching (use specialized models for specific tasks) without changing application code.

What this is NOT

Not load balancing (routing chooses models; load balancing distributes load)
Not a gateway (routing is one function a gateway might perform)
Not the models themselves

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

Logic that decides whether to route a request to GPT-4, GPT-3.5, Claude, a local model, or a specialized model based on the query characteristics. Can be rule-based or ML-based.

Sources: Martian router, RouteLLM paper, LLM routing patterns

Examples

Route 'What is 2+2?' to GPT-3.5, 'Write a novel' to GPT-4
Martian router using ML to predict optimal model
Cascade: try Llama-7B, if confidence < 0.8 try GPT-4
Route coding questions to Claude, math to GPT-4

Counterexamples

Things that might seem like Model Router but are not:

Load balancing across identical model replicas
Always using the same model
Manual model selection by the user

Relations

overlapsWith api-gateway (Routing is often a gateway function)
overlapsWith load-balancing (Related but different concerns)
overlapsWith inference-endpoint (Routing selects between endpoints)

Implementations

Tools and frameworks that implement this concept:

Continue secondary
LiteLLM secondary
Portkey primary