Claude API pricing is quoted per million tokens, which sounds cheap until you realize how fast tokens add up in real workflows. The headline rates are easy to find; what is hard is translating them into "what will this actually cost me per month?" This guide does that translation with real use-case math, then shows the two levers that cut the bill the most.
Current Claude API rates (2026)
- Claude Opus 4.8 (flagship): $5.00 input / $25.00 output per million tokens.
- Claude Sonnet 4.6 (balanced): $3.00 input / $15.00 output per million tokens.
- Claude Haiku 4.5 (fast/cheap): $1.00 input / $5.00 output per million tokens.
- You pay separately for input tokens (your prompts plus context) and output tokens (Claude's responses).
Note that Opus 4.8 dropped dramatically from the Opus 4.7 era — the flagship is now $5/$25 per million tokens versus the previous generation's $30/$150. That single change reshaped the math for anyone running Opus at volume.
Real monthly cost by use case
These are rough but realistic monthly estimates assuming typical token volumes. Your actual numbers depend on prompt size, output length, and how much context you re-send.
- Solo developer using Claude for code review and debugging (light Sonnet usage): roughly $15–$40/month.
- Power user running an agentic coding loop most days (heavy Sonnet, occasional Opus): roughly $80–$250/month.
- Small team running a customer-facing feature on the API (mixed Haiku + Sonnet at scale): roughly $300–$1,200/month.
- High-volume production app (Haiku-first with caching): cost-per-request matters far more than headline rate — this is where caching and batch pay off.
The two levers that cut your bill most
1. Prompt caching (up to 90% off cached input)
If you re-send the same system prompt, instructions, or document context across many requests, prompt caching cuts the cost of that cached input by up to 90%. For RAG apps, coding agents, and anything with a large fixed prompt, this is the single biggest lever — often larger than switching models.
2. Batch processing (50% off)
For workloads that do not need a real-time response — overnight processing, bulk classification, data enrichment — the Batch API is 50% cheaper across all models. If latency does not matter, batching halves the cost with zero quality loss.
Model choice is a cost decision
The most common waste is running Opus on tasks Haiku or Sonnet would handle fine. Opus output costs 5x Haiku output. Reserve the flagship for genuinely hard reasoning, route everything else down. A simple "default to Sonnet, escalate to Opus only when needed" rule cuts most bills meaningfully.
FAQ
How much does the Claude API cost per million tokens?
As of 2026: Opus 4.8 is $5 input / $25 output, Sonnet 4.6 is $3 / $15, and Haiku 4.5 is $1 / $5 per million tokens. Input and output are billed separately.
Is the Claude API cheaper than a Claude subscription?
It depends on volume. Light, occasional use is often cheaper on the API because you only pay for what you use. Heavy daily use is usually cheaper on a flat-rate Pro or Max subscription. The break-even depends on your token volume — tracking your real usage is the only way to know which side you are on.
How do I reduce my Claude API bill?
In order of impact: enable prompt caching for repeated context (up to 90% off cached input), use the Batch API for non-urgent work (50% off), and route simple tasks to Haiku or Sonnet instead of Opus.
Comparing API cost against a flat subscription?
Use the AI plan comparator to weigh Claude API usage against Pro and Max subscriptions side-by-side.
