Extended thinking is hiding your real Claude bill
Anthropic's extended thinking and OpenAI's reasoning models charge thinking tokens at the output rate. Most accounting dashboards never show them. Here's how to actually see your bill — and what to do about it.
Extended thinking — Anthropic's on-by-default reasoning mode, and OpenAI's o1 / o3 family — is a genuine step forward in answer quality. It's also quietly the biggest line item in many production AI bills, and most teams have no idea because their accounting tools don't surface it.
The thing nobody told you about thinking tokens
When Claude (or any reasoning model) "thinks," it generates a hidden chain-of-thought before producing the visible answer. Those thinking tokens are billed at the output rate, same as the final answer. Anthropic and OpenAI both surface this in their docs, but the per-request UI usually shows you only the visible response length.
Practical effect: a request that looks like "input 800 tokens, output 200 tokens" might actually be input 800, thinking 12,000, output 200. You're paying for 12,200 output tokens, not 200.
The math on a typical workload
Claude Sonnet 4.5 charges $3/M input and $15/M output. A complex coding task with extended thinking might break down like this:
- Input context: 4k tokens → $0.012
- Thinking: 30k tokens → $0.45
- Visible output: 2k tokens → $0.03
- Total: ~$0.49
If you're only counting visible output, you think the call cost $0.04. The real number is 12× that. Extrapolate to 50,000 calls/month and the gap is the difference between a $2k bill and a $25k bill.
Why your dashboard doesn't show it
Two reasons. First, the OpenAI and Anthropic dashboards do show thinking tokens, but they bucket them in with output, which obscures the breakdown. Second, every third-party "LLM observability" tool I've checked at time of writing reads from the response object and countsresponse.usage.output_tokens — which on a reasoning model excludes thinking. The thinking tokens are exposed separately on the response as response.usage.cache_creation_input_tokens or a similar field depending on the SDK version, and most dashboards just don't fetch them.
This is the same dynamic we covered in why input-only savings dashboards are lying to you — dashboards that count one side of the bill and ignore the other will always understate the real cost. Reasoning tokens are the 2026 version of that problem.
How to actually see your real bill
Three places to check, in order of how directly they tell you the truth:
- The provider dashboard. Anthropic's usage page now shows "reasoning tokens" as a separate line item. OpenAI's shows it under the model name. This is the ground truth — every other tool should reconcile to it.
- The full response object. Log
response.usageon every call. The relevant fields differ by SDK version — read the docs, not the README. - Your savings dashboard (if any). Verify it shows reasoning separately. If it shows one "output tokens" number and that number matches the visible response length, the tool is lying to you. EcoToken splits visible output and reasoning into separate columns; not all tools do.
What to do about it
Reasoning is genuinely valuable for hard tasks. The fix isn't to turn it off — it's to use it deliberately.
- Route by difficulty. Classification, extraction, formatting — no reasoning needed. Send those to Haiku without thinking. Reserve reasoning models for genuinely hard multi-step problems.
- Cap the budget. Anthropic exposes
budget_tokensfor the thinking step. Setting it explicitly (say, 4096 instead of unbounded) cuts the worst-case bill in half on tasks that don't need to think for that long. - Cache the system prompt. Reasoning tokens aren't cached, but everything that precedes them (system prompt, few-shot examples) can be. Wrap them in
cache_controland watch ~80% of your input cost vanish.
The default settings on every reasoning model assume you're fine paying for unbounded thinking. You probably aren't. Bound it.
Bottom line
Extended thinking is a great feature billed in a confusing way. Surface the thinking tokens in your own accounting before you compare any "optimizer" tool, because half the cost reduction claimed by dashboards in 2026 is just "we turned thinking off" — not an optimization at all.