Why "input-only" savings dashboards are lying to you
Most prompt-optimizer tools claim huge savings by counting input tokens only. That understates the real number by 3-5× and ignores the actual value prop. Here's what honest math looks like.
If you've tried any prompt-optimization tool, you've probably seen a dashboard claim something like "you saved 14,000 tokens this month!" And if you've done the arithmetic you probably noticed the dollar number attached to it doesn't quite add up. Here's why.
The trick: counting inputs only
Most tools compute savings as (original_input_tokens − optimized_input_tokens) × iteration_multiplier. They completely ignore output tokens and the iteration multiplier is usually hardcoded at 2 or 2.5.
That's bad math for two reasons:
- Output tokens are the bigger cost on most models. Claude Sonnet charges $3/M input but $15/M output — a 5× spread. GPT-4o is $2.50 / $10 — 4×. Ignoring output means ignoring most of the bill.
- The iteration multiplier isn't universal. A clean, well-framed prompt might take 1.2 iterations. A vague one might take 4. Assuming 2.5 for everyone means you're off by ~50% in both directions depending on who you are.
What honest math looks like
Real savings accounting needs three inputs:
- Input tokens saved — directly measurable.
- Output tokens saved — requires knowing your typical output-to-input ratio (which varies by workflow).
- Iteration multiplier — how many rounds your unoptimized prompts would have needed.
None of those are guessable from the optimization itself. You have to measure them. Per workflow. Against your actual model pricing. With real runs, not synthetic benchmarks.
Per-project calibration, briefly
The mechanism we use: when you set up a project, you run 2-3 short prompts through your actual AI tool and paste the replies back. We tokenize the replies in your browser (the text never hits our server — only the resulting integer count does) and fit a log-linear curve across the (input tokens, output tokens) samples.
That curve gives us an accurate output ratio at any prompt size. Your savings numbers on the dashboard then use the real curve, not a hardcoded default. Every record is priced against the model you actually used — Sonnet, Haiku, GPT-4o, whatever. No blended rates.
How to check whether a savings dashboard is lying to you
Hover over the numbers. Every honest dashboard explains its math. If there's no tooltip, no formula, no way to see what "$X saved" actually means — assume it's marketing copy, not accounting.
A quick gut-check: if the claimed savings per run is greater than ~70% of your actual per-run cost to the provider, something's wrong. Most prompts can't legitimately save more than that.