Token pricing varies dramatically across providers and models. Choosing the right model for your workload can mean the difference between spending $10/month and $1,000/month. Here's a practical comparison of current pricing and how to pick the most cost-effective option.

Current Pricing Overview (Per Million Tokens)

Prices are listed as input / output per million tokens:

OpenAI

  • GPT-4o: $2.50 input / $10.00 output
  • GPT-4o mini: $0.15 input / $0.60 output
  • GPT-4.1: $2.00 input / $8.00 output
  • GPT-4.1 mini: $0.40 input / $1.60 output
  • GPT-4.1 nano: $0.10 input / $0.40 output
  • o3: $2.00 input / $8.00 output
  • o4-mini: $1.10 input / $4.40 output

Anthropic

  • Claude 3.5 Sonnet: $3.00 input / $15.00 output
  • Claude 3.5 Haiku: $0.80 input / $4.00 output
  • Claude 3 Opus: $15.00 input / $75.00 output
  • Claude 4 Sonnet: $3.00 input / $15.00 output

Google

  • Gemini 2.5 Pro: $1.25 input / $10.00 output (under 200K tokens)
  • Gemini 2.5 Flash: $0.15 input / $0.60 output
  • Gemini 2.0 Flash: $0.10 input / $0.40 output
  • Gemini 1.5 Pro: $1.25 input / $5.00 output (under 128K tokens)

Real-World Cost Scenarios

Abstract per-million pricing is hard to reason about. Here's what common workloads actually cost:

Scenario 1: Customer Support Chatbot

Average conversation: 500 input tokens, 300 output tokens. 10,000 conversations/day.

  • GPT-4o: $0.0125 + $0.03 = $0.0425/day → $1.28/day
  • GPT-4o mini: $0.00075 + $0.0018 = $0.00255/day → $0.08/day
  • Claude 3.5 Haiku: $0.004 + $0.012 = $0.016/day → $0.48/day
  • Gemini 2.0 Flash: $0.0005 + $0.0012 = $0.0017/day → $0.05/day

Scenario 2: Document Summarization

Average document: 10,000 input tokens, 500 output tokens. 1,000 documents/day.

  • GPT-4o: $0.025 + $0.005 = $0.03/doc → $30/day
  • Claude 3.5 Sonnet: $0.03 + $0.0075 = $0.0375/doc → $37.50/day
  • Gemini 2.5 Flash: $0.0015 + $0.0003 = $0.0018/doc → $1.80/day

When to Use Which Model

Use Mini/Flash Models When:

  • Tasks are straightforward (classification, extraction, simple Q&A)
  • Volume is high and cost sensitivity is critical
  • Latency matters more than maximum quality
  • You're building prototypes or development environments

Use Flagship Models When:

  • Tasks require complex reasoning or nuanced understanding
  • Output quality directly impacts revenue or user experience
  • You're processing code, legal documents, or technical content
  • The cost per request is justified by the value of each response

Cost Optimization Tips

  • Route by complexity: Use a cheap model to classify requests, then route complex ones to a flagship model. This can cut costs by 60–80%.
  • Cache repeated prompts: Both OpenAI and Anthropic offer prompt caching that reduces input costs by 50–90% for repeated prefixes.
  • Batch when possible: OpenAI's Batch API offers 50% off for non-time-sensitive requests.
  • Optimize prompts: Reducing prompt length by 40% saves 40% on input costs for every single request.
Pricing changes frequently. Always check the provider's official pricing page before making architecture decisions. The numbers above reflect pricing as of early 2025.