Multi-Model Routing: Matching Query Complexity to the Right Model

Summary: Claude Opus costs 187x more than Gemini Flash per output token. A weather lookup doesn't need Claude Sonnet at $15/million output tokens when Haiku handles it at $5/million with equivalent quality. Intelligent model routing—matching query complexity to the cheapest capable model—can meaningfully reduce costs when properly implemented and measured.

The 187x Price Spread Most Teams Ignore

Gemini 2.0 Flash: $0.40 per million output tokens. Claude Opus 4: $75 per million. That's a 187x difference—and most AI products route every request through expensive models regardless of task complexity.

A common pattern: an AI startup launches with a single model powering everything. GPT-4o for customer support tickets. Claude Sonnet for intent classification. Frontier models processing "what's my account balance?"

The reasoning is understandable. One model means simpler architecture, fewer edge cases, and consistent behavior. But it also means paying $3-15 per million tokens for tasks that $0.10-1 models handle with equivalent quality.

This compounds the usage variance we've analyzed in Understanding Per-Customer Cost Distribution. High-usage customers already show compressed margins. Routing all requests through expensive models adds unnecessary cost per request.

Current Model Pricing

Here's what you're choosing between (November 2025):

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
Gemini 2.0 Flash	$0.10	$0.40	High-volume, simple tasks
GPT-4o-mini	$0.15	$0.60	Cost-efficient classification
Claude Haiku 3.5	$1.00	$5.00	Fast coding, structured output
GPT-4o	$2.50	$10.00	General multimodal tasks
Claude Sonnet 3.5	$3.00	$15.00	Complex coding, agents
Claude Opus 4	$15.00	$75.00	Frontier reasoning, architecture

The spread is significant. Even within the Anthropic family, Haiku is 3x cheaper than Sonnet and 15x cheaper than Opus. Claude Haiku 3.5 achieves 73.3% on SWE-bench Verified—performance that would have been state-of-the-art in early 2024—at one-third the cost of Sonnet.

Example: An AI assistant handling 1 million requests monthly (500 input tokens, 200 output tokens average):

Strategy	Monthly Cost	Savings vs. All-Sonnet
All Claude Sonnet 3.5	$4,500	—
All Claude Haiku 3.5	$1,500	67%
70% Haiku / 30% Sonnet	$2,400	47%

Your actual results depend on traffic composition and routing accuracy.

Which Tasks Need Which Models

The key insight: model capability exists on a spectrum, and most tasks don't require frontier intelligence.

Task Type	Cheap Model Works	Premium Model Needed
Classification & intent detection	✓ Haiku, GPT-4o-mini handle 95%+ accuracy	—
Data extraction (names, dates, amounts)	✓ Pattern matching, not reasoning	—
Format transformation (JSON, text)	✓ Doesn't need frontier capability	—
Simple Q&A and FAQ	✓ Straightforward retrieval	—
Standard summarization	✓ Comparable performance	—
Complex reasoning chains	—	✓ Multi-step inference, synthesis
Nuanced writing (legal, marketing)	—	✓ Tone, persuasion, style
Architectural decisions	—	✓ System design, complex debugging
Ambiguous or novel situations	—	✓ Genuine judgment required
Safety-critical outputs	—	✓ Medical, legal, financial

The goal isn't minimizing spend at all costs—it's matching capability to requirement. Underspending on complex tasks creates errors that cost more than the savings.

Three Routing Approaches

Approach	How It Works	Pros	Cons
Rule-based	Route by feature type, user tier, input length	Fast, predictable, easy to debug	Misses complex queries that look simple
Classifier-based	Small model classifies query complexity, then routes	Adapts to content, catches edge cases	Adds 50-100ms latency, classifier can be wrong
Embedding similarity	Compare query to known simple/complex examples	No per-request classification cost	Requires curated examples, may miss novel queries

Classifier economics: At Haiku pricing, classifying 1 million requests costs ~$350. If correct routing saves $2,000+/month, the ROI is clear—but measure to confirm.

When routing isn't worth it: If traffic is uniformly complex (all architecture reviews, all legal analysis), routing overhead won't pay off. At low volume (under 10,000 requests/month), engineering investment may exceed savings.

The Missing Piece: Measurement

Here's what most implementations miss: they don't measure whether routing is actually working.

You implement routing logic, deploy to production, and check your bill next month. Maybe it went down. Maybe it went up because your classifier is miscategorizing queries and triggering expensive fallbacks. Without granular tracking, you're guessing.

Metric	What It Tells You
Cost per request by model	Whether routing decisions are saving money
Fallback rate	How often your router misjudges complexity
Quality scores by route	Whether cheap models degrade user experience
Cost per customer	Which customers show negative margin even with routing

This is what Bear Lumen tracks automatically—cost per request by model, per-customer attribution, fallback rates, and margin impact. Without this visibility, routing is guesswork. With it, you can continuously optimize based on real data.

Key Takeaways

The pricing spread is significant. Up to 187x between cheapest and most expensive models. Using frontier models for simple tasks is avoidable cost.
Many tasks don't need frontier models. Classification, extraction, simple Q&A often work well on cheaper models—validate for your use case.
Start with rule-based routing. Route by feature and input length before investing in classifiers.
You can't optimize what you can't measure. Track cost per model and per customer before and after implementing routing.

Track your model costs and routing effectiveness with Bear Lumen. See exactly where your AI spend goes—and whether your optimization strategies actually work.

Join the early access program →

Multi-Model Routing: Matching Query Complexity to the Right Model

The 187x Price Spread Most Teams Ignore

Current Model Pricing

Which Tasks Need Which Models

Three Routing Approaches

The Missing Piece: Measurement

Key Takeaways

Related Articles

Billing Infrastructure vs. Cost Visibility: What AI Teams Need

Unit Economics for AI Products: A Cost Framework Beyond Tokens

Why AI Pricing Should Work Like Uber, Not Like Parking Meters