Summary: Claude Opus costs 187x more than Gemini Flash per output token. A weather lookup doesn't need Claude Sonnet at $15/million output tokens when Haiku handles it at $5/million with equivalent quality. Intelligent model routing—matching query complexity to the cheapest capable model—can meaningfully reduce costs when properly implemented and measured.
The 187x Price Spread Most Teams Ignore
Gemini 2.0 Flash: $0.40 per million output tokens. Claude Opus 4: $75 per million. That's a 187x difference—and most AI products route every request through expensive models regardless of task complexity.
A common pattern: an AI startup launches with a single model powering everything. GPT-4o for customer support tickets. Claude Sonnet for intent classification. Frontier models processing "what's my account balance?"
The reasoning is understandable. One model means simpler architecture, fewer edge cases, and consistent behavior. But it also means paying $3-15 per million tokens for tasks that $0.10-1 models handle with equivalent quality.
This compounds the usage variance we've analyzed in Understanding Per-Customer Cost Distribution. High-usage customers already show compressed margins. Routing all requests through expensive models adds unnecessary cost per request.
Current Model Pricing
Here's what you're choosing between (November 2025):
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | High-volume, simple tasks |
| GPT-4o-mini | $0.15 | $0.60 | Cost-efficient classification |
| Claude Haiku 3.5 | $1.00 | $5.00 | Fast coding, structured output |
| GPT-4o | $2.50 | $10.00 | General multimodal tasks |
| Claude Sonnet 3.5 | $3.00 | $15.00 | Complex coding, agents |
| Claude Opus 4 | $15.00 | $75.00 | Frontier reasoning, architecture |
The spread is significant. Even within the Anthropic family, Haiku is 3x cheaper than Sonnet and 15x cheaper than Opus. Claude Haiku 3.5 achieves 73.3% on SWE-bench Verified—performance that would have been state-of-the-art in early 2024—at one-third the cost of Sonnet.
Example: An AI assistant handling 1 million requests monthly (500 input tokens, 200 output tokens average):
| Strategy | Monthly Cost | Savings vs. All-Sonnet |
|---|---|---|
| All Claude Sonnet 3.5 | $4,500 | — |
| All Claude Haiku 3.5 | $1,500 | 67% |
| 70% Haiku / 30% Sonnet | $2,400 | 47% |
Your actual results depend on traffic composition and routing accuracy.
Which Tasks Need Which Models
The key insight: model capability exists on a spectrum, and most tasks don't require frontier intelligence.
| Task Type | Cheap Model Works | Premium Model Needed |
|---|---|---|
| Classification & intent detection | ✓ Haiku, GPT-4o-mini handle 95%+ accuracy | — |
| Data extraction (names, dates, amounts) | ✓ Pattern matching, not reasoning | — |
| Format transformation (JSON, text) | ✓ Doesn't need frontier capability | — |
| Simple Q&A and FAQ | ✓ Straightforward retrieval | — |
| Standard summarization | ✓ Comparable performance | — |
| Complex reasoning chains | — | ✓ Multi-step inference, synthesis |
| Nuanced writing (legal, marketing) | — | ✓ Tone, persuasion, style |
| Architectural decisions | — | ✓ System design, complex debugging |
| Ambiguous or novel situations | — | ✓ Genuine judgment required |
| Safety-critical outputs | — | ✓ Medical, legal, financial |
The goal isn't minimizing spend at all costs—it's matching capability to requirement. Underspending on complex tasks creates errors that cost more than the savings.
Three Routing Approaches
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Rule-based | Route by feature type, user tier, input length | Fast, predictable, easy to debug | Misses complex queries that look simple |
| Classifier-based | Small model classifies query complexity, then routes | Adapts to content, catches edge cases | Adds 50-100ms latency, classifier can be wrong |
| Embedding similarity | Compare query to known simple/complex examples | No per-request classification cost | Requires curated examples, may miss novel queries |
Classifier economics: At Haiku pricing, classifying 1 million requests costs ~$350. If correct routing saves $2,000+/month, the ROI is clear—but measure to confirm.
When routing isn't worth it: If traffic is uniformly complex (all architecture reviews, all legal analysis), routing overhead won't pay off. At low volume (under 10,000 requests/month), engineering investment may exceed savings.
The Missing Piece: Measurement
Here's what most implementations miss: they don't measure whether routing is actually working.
You implement routing logic, deploy to production, and check your bill next month. Maybe it went down. Maybe it went up because your classifier is miscategorizing queries and triggering expensive fallbacks. Without granular tracking, you're guessing.
| Metric | What It Tells You |
|---|---|
| Cost per request by model | Whether routing decisions are saving money |
| Fallback rate | How often your router misjudges complexity |
| Quality scores by route | Whether cheap models degrade user experience |
| Cost per customer | Which customers show negative margin even with routing |
This is what Bear Lumen tracks automatically—cost per request by model, per-customer attribution, fallback rates, and margin impact. Without this visibility, routing is guesswork. With it, you can continuously optimize based on real data.
Key Takeaways
-
The pricing spread is significant. Up to 187x between cheapest and most expensive models. Using frontier models for simple tasks is avoidable cost.
-
Many tasks don't need frontier models. Classification, extraction, simple Q&A often work well on cheaper models—validate for your use case.
-
Start with rule-based routing. Route by feature and input length before investing in classifiers.
-
You can't optimize what you can't measure. Track cost per model and per customer before and after implementing routing.
Track your model costs and routing effectiveness with Bear Lumen. See exactly where your AI spend goes—and whether your optimization strategies actually work.
Join the early access program →
Related: Usage Variance in AI Products | Real Cost of AI Products in 2025 | AI API Costs 2025