A single power user generating 1M tokens per month can compress your margin from 97% to 45%—even on GPT-4 Turbo. This guide shows how to calculate whether your pricing model can handle that variance.
Quick Comparison Table
| Model | Input Cost | Output Cost | Context Window | Best For |
|---|---|---|---|---|
| GPT-4 Turbo | $10/1M tokens | $30/1M tokens | 128K | General purpose, reasoning |
| GPT-4o | $2.50/1M tokens | $10/1M tokens | 128K | Cost-optimized GPT-4 |
| Claude 3.5 Sonnet | $3/1M tokens | $15/1M tokens | 200K | Long documents, code |
| Gemini 1.5 Pro | $1.25/1M tokens | $5/1M tokens | 1M | Cost-sensitive, large context |
| Claude 3 Haiku | $0.25/1M tokens | $1.25/1M tokens | 200K | High-volume, simple tasks |
Prices as of January 2025. A token is roughly ¾ of a word.
What is Cost-to-Serve?
Cost-to-serve is the total infrastructure cost required to deliver your product to one customer, including API costs, compute, storage, and overhead.
Formula:
Cost-to-Serve = (AI API Costs + Infrastructure + Overhead) / Active Users
For AI-powered SaaS products, AI API costs typically represent 60-80% of total cost-to-serve.
Real-World Example: AI Writing Assistant
Let's calculate the economics for a hypothetical AI writing assistant:
Usage Assumptions
- Average user: 50,000 tokens/month (input + output combined)
- Token split: 70% input, 30% output
- Price: $29/month subscription
Cost Breakdown by Model
| Model | Input Cost | Output Cost | Total AI Cost | Margin | Margin % |
|---|---|---|---|---|---|
| GPT-4 Turbo | $0.35 | $0.45 | $0.80 | $28.20 | 97.2% |
| GPT-4o | $0.09 | $0.15 | $0.24 | $28.76 | 99.2% |
| Claude 3.5 Sonnet | $0.11 | $0.23 | $0.34 | $28.66 | 98.8% |
| Claude 3 Haiku | $0.009 | $0.019 | $0.03 | $28.97 | 99.9% |
At average usage, every model yields healthy margins. The problem emerges with variance.
High-Usage Customers: Where Margins Compress
Not all users are average. Consider a power user generating 1,000,000 tokens/month (20x average) while paying the same $29/month:
GPT-4 Turbo Costs for Power User:
| Component | Calculation | Cost |
|---|---|---|
| Input | 700,000 × $10/1M | $7.00 |
| Output | 300,000 × $30/1M | $9.00 |
| Total | $16.00 |
Margin: $29 - $16 = $13 (44.8%)
Still profitable, but margin compressed from 97% to 45%. Bear Lumen shows this variance by customer, so you see it before margins compress across your base.
For detailed analysis of usage variance affecting GitHub Copilot, Cursor, and ChatGPT Pro, see: Usage Variance in AI Products.
With Usage-Based Tiers:
If you implement overage pricing:
- First 50k tokens: Included in $29 base
- Next 950k tokens: $0.015/1k tokens extra
Power user revenue: $29 + ($0.015 × 950) = $43.25
Margin: $43.25 - $16 = $27.25 (63%)
Better alignment between value delivered and cost incurred.
GitHub Copilot: A Real-World Example
GitHub Copilot operates at negative margins on most users:
- Price: $10/month (or $19 for business)
- Average cost: $20-$30/month per user
- Result: Negative margins on average users
Why? Unlimited usage model combined with token-intensive code generation. Users generate millions of tokens monthly, and pricing hasn't adjusted since launch.
Pattern: Usage-based pricing aligns revenue with cost-to-serve across customer segments.
When to Switch Models
| Use Case | Recommended Model | Why |
|---|---|---|
| Quality-critical, premium customers | GPT-4 Turbo | Best reasoning capability |
| High-volume, GPT-4 quality needed | GPT-4o | 75% cost reduction vs Turbo |
| Large documents, code analysis | Claude Sonnet | 200K context window |
| Simple tasks, classification, routing | Haiku or Gemini | Lowest cost per token |
The right model depends on the task. Many teams run multiple models, routing requests based on complexity.
Tracking Cost-to-Serve
Effective cost tracking requires three components:
- Instrument API calls — Record tokens consumed per request with user/customer attribution
- Aggregate by customer — Sum costs monthly, calculate cost per active user
- Compare to revenue — Identify which customers are margin-positive and which are compressing margins
Most teams discover 10-20% of customers generate 60-80% of API costs. Without per-customer visibility, this variance remains invisible until aggregate margins decline.
External Resources
Want to see cost-to-serve by customer? Join our early access program for margin visibility at the customer level.