Model Router
Also known as: LLM Router, Intelligent Routing, Model Selection
Definition
A system that dynamically selects which model to use for a given request based on criteria like cost, latency, capability, or query complexity. Routers enable cost optimization (use cheaper models for simple queries) and capability matching (use specialized models for specific tasks) without changing application code.
What this is NOT
- Not load balancing (routing chooses models; load balancing distributes load)
- Not a gateway (routing is one function a gateway might perform)
- Not the models themselves
Alternative Interpretations
Different communities use this term differently:
llm-practitioners
Logic that decides whether to route a request to GPT-4, GPT-3.5, Claude, a local model, or a specialized model based on the query characteristics. Can be rule-based or ML-based.
Sources: Martian router, RouteLLM paper, LLM routing patterns
Examples
- Route 'What is 2+2?' to GPT-3.5, 'Write a novel' to GPT-4
- Martian router using ML to predict optimal model
- Cascade: try Llama-7B, if confidence < 0.8 try GPT-4
- Route coding questions to Claude, math to GPT-4
Counterexamples
Things that might seem like Model Router but are not:
- Load balancing across identical model replicas
- Always using the same model
- Manual model selection by the user
Relations
- overlapsWith api-gateway (Routing is often a gateway function)
- overlapsWith load-balancing (Related but different concerns)
- overlapsWith inference-endpoint (Routing selects between endpoints)
Implementations
Tools and frameworks that implement this concept: