Model Router

System deployment published

Also known as: LLM Router, Intelligent Routing, Model Selection

Definition

A system that dynamically selects which model to use for a given request based on criteria like cost, latency, capability, or query complexity. Routers enable cost optimization (use cheaper models for simple queries) and capability matching (use specialized models for specific tasks) without changing application code.

What this is NOT

  • Not load balancing (routing chooses models; load balancing distributes load)
  • Not a gateway (routing is one function a gateway might perform)
  • Not the models themselves

Alternative Interpretations

Different communities use this term differently:

llm-practitioners

Logic that decides whether to route a request to GPT-4, GPT-3.5, Claude, a local model, or a specialized model based on the query characteristics. Can be rule-based or ML-based.

Sources: Martian router, RouteLLM paper, LLM routing patterns

Examples

  • Route 'What is 2+2?' to GPT-3.5, 'Write a novel' to GPT-4
  • Martian router using ML to predict optimal model
  • Cascade: try Llama-7B, if confidence < 0.8 try GPT-4
  • Route coding questions to Claude, math to GPT-4

Counterexamples

Things that might seem like Model Router but are not:

  • Load balancing across identical model replicas
  • Always using the same model
  • Manual model selection by the user

Relations

Implementations

Tools and frameworks that implement this concept: