The True Cost of Running AI APIs: Complete 2025 Guide

A single power user generating 1M tokens per month can compress your margin from 97% to 45%—even on GPT-4 Turbo. This guide shows how to calculate whether your pricing model can handle that variance.

Quick Comparison Table

Model	Input Cost	Output Cost	Context Window	Best For
GPT-4 Turbo	$10/1M tokens	$30/1M tokens	128K	General purpose, reasoning
GPT-4o	$2.50/1M tokens	$10/1M tokens	128K	Cost-optimized GPT-4
Claude 3.5 Sonnet	$3/1M tokens	$15/1M tokens	200K	Long documents, code
Gemini 1.5 Pro	$1.25/1M tokens	$5/1M tokens	1M	Cost-sensitive, large context
Claude 3 Haiku	$0.25/1M tokens	$1.25/1M tokens	200K	High-volume, simple tasks

Prices as of January 2025. A token is roughly ¾ of a word.

What is Cost-to-Serve?

Cost-to-serve is the total infrastructure cost required to deliver your product to one customer, including API costs, compute, storage, and overhead.

Formula:

Cost-to-Serve = (AI API Costs + Infrastructure + Overhead) / Active Users

For AI-powered SaaS products, AI API costs typically represent 60-80% of total cost-to-serve.

Real-World Example: AI Writing Assistant

Let's calculate the economics for a hypothetical AI writing assistant:

Usage Assumptions

Average user: 50,000 tokens/month (input + output combined)
Token split: 70% input, 30% output
Price: $29/month subscription

Cost Breakdown by Model

Model	Input Cost	Output Cost	Total AI Cost	Margin	Margin %
GPT-4 Turbo	$0.35	$0.45	$0.80	$28.20	97.2%
GPT-4o	$0.09	$0.15	$0.24	$28.76	99.2%
Claude 3.5 Sonnet	$0.11	$0.23	$0.34	$28.66	98.8%
Claude 3 Haiku	$0.009	$0.019	$0.03	$28.97	99.9%

At average usage, every model yields healthy margins. The problem emerges with variance.

High-Usage Customers: Where Margins Compress

Not all users are average. Consider a power user generating 1,000,000 tokens/month (20x average) while paying the same $29/month:

GPT-4 Turbo Costs for Power User:

Component	Calculation	Cost
Input	700,000 × $10/1M	$7.00
Output	300,000 × $30/1M	$9.00
Total		$16.00

Margin: $29 - $16 = $13 (44.8%)

Still profitable, but margin compressed from 97% to 45%. Bear Lumen shows this variance by customer, so you see it before margins compress across your base.

For detailed analysis of usage variance affecting GitHub Copilot, Cursor, and ChatGPT Pro, see: Usage Variance in AI Products.

With Usage-Based Tiers:

If you implement overage pricing:

First 50k tokens: Included in $29 base
Next 950k tokens: $0.015/1k tokens extra

Power user revenue: $29 + ($0.015 × 950) = $43.25

Margin: $43.25 - $16 = $27.25 (63%)

Better alignment between value delivered and cost incurred.

GitHub Copilot: A Real-World Example

GitHub Copilot operates at negative margins on most users:

Price: $10/month (or $19 for business)
Average cost: $20-$30/month per user
Result: Negative margins on average users

Why? Unlimited usage model combined with token-intensive code generation. Users generate millions of tokens monthly, and pricing hasn't adjusted since launch.

Pattern: Usage-based pricing aligns revenue with cost-to-serve across customer segments.

Related: From Flat-Rate to Usage-Based Pricing Migration

When to Switch Models

Use Case	Recommended Model	Why
Quality-critical, premium customers	GPT-4 Turbo	Best reasoning capability
High-volume, GPT-4 quality needed	GPT-4o	75% cost reduction vs Turbo
Large documents, code analysis	Claude Sonnet	200K context window
Simple tasks, classification, routing	Haiku or Gemini	Lowest cost per token

The right model depends on the task. Many teams run multiple models, routing requests based on complexity.

Tracking Cost-to-Serve

Effective cost tracking requires three components:

Instrument API calls — Record tokens consumed per request with user/customer attribution
Aggregate by customer — Sum costs monthly, calculate cost per active user
Compare to revenue — Identify which customers are margin-positive and which are compressing margins

Most teams discover 10-20% of customers generate 60-80% of API costs. Without per-customer visibility, this variance remains invisible until aggregate margins decline.

External Resources

Want to see cost-to-serve by customer? Join our early access program for margin visibility at the customer level.