Twelve months ago the most capable model you could rent was Claude Opus 4, at $15 in and $75 out per million tokens. Today the flagship Opus costs a third of that: $5 in, $25 out. So the price of frontier AI collapsed over the year, right?
Not even slightly. It went up. You just have to know where to look, and who is paying.
The headline per-token rates are a fog. They moved in every direction at once over the last twelve months, and most of them are now actively misleading. Here is what actually happened to the cost of intelligence, using OpenRouter’s published figures: why a model Google shipped two weeks ago is the most interesting thing on the page, and why the best-value seat in the house right now is the one most people already have.
A note on currency before the numbers. Everything here is in US dollars, the currency these labs bill in, which keeps the comparisons clean. For Australians that is a quiet second tax: a weaker Aussie dollar means every one of these rises lands harder on our side of the Pacific than the sticker suggests, and a climbing sticker only compounds the hit.
The sticker prices went everywhere
Line up the flagship and workhorse tiers from the two labs everyone watches and the picture is chaos.
| Tier | A year ago | Now (Jun 2026) | Sticker move |
|---|---|---|---|
| Anthropic flagship | Opus 4 - $15 / $75 | Opus 4.8 - $5 / $25 | down ~67% |
| Anthropic workhorse | Sonnet 4 - $3 / $15 | Sonnet 4.6 - $3 / $15 | flat |
| OpenAI flagship | GPT-4.1 - $2 / $8 | GPT-5.5 - $5 / $30 | up 150% / 275% |
| OpenAI workhorse | GPT-4o - $2.50 / $10 | GPT-5.4 - $2.50 / $15 | flat / up 50% |
Prices are OpenRouter list rates, US dollars per million tokens, input / output.
Anthropic took its flagship down hard, cutting Opus by roughly two thirds, and left Sonnet untouched. OpenAI went the other way at the top. GPT-5.5 shipped at $5 / $30, a literal doubling of GPT-5.4’s $2.50 / $15. And a tier appeared that simply did not exist a year ago: GPT-5.4 Pro, OpenAI’s heavy reasoning model, lists at $30 / $180. For reference, o3, the dedicated reasoning model people were excited about in mid-2025, launched at $10 / $40 before being cut to $2 / $8. The top of the OpenAI range is now an order of magnitude above that.
Two labs, opposite directions. If you only read the sticker, you would conclude nothing coherent happened.
The number that actually went up
The thing that rose, unambiguously, is what it costs to finish a job.
Per-token price is a unit cost. What you pay is unit cost times tokens consumed, and the new models are ravenous. They think before they answer, and thinking is billed.
OpenRouter ran the cleanest test I have seen on this. They took the cohort of users who switched from GPT-5.4 to GPT-5.5 and measured the same workloads before and after. GPT-5.5’s sticker had doubled, OpenAI insisted the model was less verbose, and the real bill still rose between 49% and 92% depending on prompt size. Less verbose, more expensive.
Anthropic is running the same play more quietly. Opus 4.7 carried the exact same $5 / $25 sticker as 4.6, yet independent token counts put the effective increase at 30% to 40%, because the newer model simply burns more tokens to reach an answer.
So the sticker can fall while your invoice climbs. Raw dollars-per-token has become close to useless as a standalone metric. The real question is dollars per finished task, and on that measure the cost of frontier intelligence went up this year, not down.
Enter Gemini 3.5 Flash
If you pay per token, there is an antidote, and Google shipped it on 19 May.
Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index. The only two models clearly above it are Claude Opus 4.8 at 61 and GPT-5.5 at 60, and both cost three to four times as much per token. Flash lands within six points of the frontier at a fraction of the price. It does this at over 280 output tokens per second, the fastest output of any model anywhere near its intelligence. It reads text, images, video, audio and PDFs natively, where Opus 4.8, GPT-5.5 and Grok all take image input only. And it gives you a thinking-effort dial - minimal, low, medium, high - so you choose how much reasoning you are willing to pay for.
Put capability against cost and the ranking is blunt.
| Rank | Model | Provider | In $/M | Out $/M | Blended $/M | Intelligence Index | Capability per $ |
|---|---|---|---|---|---|---|---|
| 1 | Gemini 3.5 Flash | $1.50 | $9 | $3.38 | 55 | 16.3 | |
| 2 | Claude Haiku 4.5 | Anthropic | $1 | $5 | $2.00 | 31 | 15.5 |
| 3 | GPT-5.4 | OpenAI | $2.50 | $15 | $5.63 | 57 | 10.1 |
| 4 | Claude Sonnet 4.6 | Anthropic | $3 | $15 | $6.00 | 52 | 8.7 |
| 5 | Claude Opus 4.8 | Anthropic | $5 | $25 | $10.00 | 61 | 6.1 |
| 6 | GPT-5.5 | OpenAI | $5 | $30 | $11.25 | 60 | 5.3 |
Blended price weights input and output three to one, OpenRouter’s own convention. Capability per dollar is the Intelligence Index divided by the blended price - a heuristic, not a benchmark, but it makes the trade visible. Index scores are Artificial Analysis figures from their May 2026 snapshot.
Gemini 3.5 Flash tops the table. The two newest flagships sit at the bottom. GPT-5.5, the most recent and most expensive model on the list, returns the least intelligence per dollar of anything here. You are paying frontier prices for a single-point capability edge over a model that costs a quarter as much. Haiku 4.5 is nearly as efficient on paper, but at an index of 31 it is playing a different sport. Flash is the only model that pairs near-frontier capability with best-in-class value.
The catch, and it is a real one
This is where most takes stop. They should not, because Flash will bite you if you use it carelessly.
First, it is far pricier than the Flash you remember. Token prices tripled over the previous generation, from $0.50 / $3 to $1.50 / $9. Google’s budget tier is drifting upmarket along with everyone else. The antidote is to frontier pricing, not a return to 2025 bargain rates.
Second, it is token-hungry. On Artificial Analysis’s agent benchmark it needed an average of 49 interaction steps per task, more than any model they tested. Opus 4.7 took 45, GPT-5.4 took 40, Google’s own Gemini 3.1 Pro took 23. Every extra step is billed input. On heavy agent workloads the total cost can land 75% above Gemini 3.1 Pro, a model with a higher sticker. The per-token discount evaporates exactly when you lean on it hardest.
Third, coding is its weak spot. On the AA Coding Index it scores 45, behind GPT-5.5 at 59, GPT-5.4 at 57, Claude Opus 4.7 at 53 and even its own Pro sibling at 55. For the one job where a fast cheap model would be most welcome, an autonomous coding loop, it is the wrong tool.
So the API antidote is real but targeted. Point Flash at high-volume, latency-sensitive, multimodal work - classification, summarisation, drafting, chat, document and image understanding - and nothing on the market touches it on value. Point it at a long autonomous coding agent and it will quietly out-spend the premium models it was meant to undercut.
Where the bill is actually exploding
All of that matters if you buy tokens. The bigger story this year is who got moved onto the meter in the first place.
For most of the last two years, businesses bought AI the way they buy any software: a flat fee per seat per month, usage bundled in. Predictable, easy for procurement, friendly to finance. That model is being dismantled, and Anthropic is leading the dismantling.
Through late 2025 and into April 2026, Anthropic rebuilt Claude Enterprise. The old deal was a fixed seat, roughly $40 a month for Standard and up to $200 for Premium, with a generous pool of tokens baked in and a 10 to 15% volume discount on top. The new deal strips the tokens out. You pay around $20 a seat for platform access, then every token bills at standard API rates with the volume discounts gone. The Register put it plainly: Anthropic has ejected bundled tokens from the enterprise seat. Seats used to be the bill. Now they are a cover charge, and the meter does the rest.
This lands at the worst possible moment, because agents are token furnaces. Goldman Sachs reckons agentic AI could push token demand up twenty-four-fold by 2030. A chatbot answers a question. An agent reads the codebase, plans, calls tools, checks its work and loops, burning input tokens at every turn.
Uber is the cautionary tale. It rolled Claude Code out to engineers in December 2025, usage doubled by February, and 95% of its engineers were soon using AI tools monthly with 70% of committed code coming from them. The result: Uber burned through its entire 2026 AI budget in four months. Individual engineers were running between $500 and $2,000 a month in tokens. The company has now capped staff at $1,500 a month per coding tool, and its own COO Andrew Macdonald admitted the link between all that spend and anything a customer would notice is “not there yet”. Microsoft is reportedly wrestling with the same maths.
OpenAI has not, so far, forced its heavy users onto the meter the way Anthropic has. ChatGPT Business is still a flat $20 a seat with usage included. Teams still works the old way, and the API volume discounts remain. OpenAI did add an optional usage-based Codex seat in April 2026 for teams that want pure token billing, but it sits alongside the flat seat rather than replacing it. Anthropic made the meter the default and removed the discounts. OpenAI made it a choice. For now, that is a real difference, and it is the kind of difference that moves enterprise contracts.
The best-value seat is the one you already have
Now stand where a consumer or a small business stands, and the same forces flip in your favour.
The flat subscription tiers did not change. Claude Pro is still $20 a month, or $17 on annual billing. Claude Max is $100 a month at the 5x tier and $200 at the 20x tier. ChatGPT mirrors it almost exactly: Plus at $20, Pro at $200. Google AI Ultra tops out at $250. And the crucial part: Claude Code is bundled into Pro and Max, Codex is bundled into ChatGPT, at no extra per-token charge. The Max 20x plan is explicitly built for developers who run Claude Code all day.
That is the same agentic, token-devouring workload that just vaporised Uber’s budget. On a subscription, within generous fair-use limits, it is absorbed into a fixed monthly fee.
Run the numbers. The break-even between Claude Pro and buying the equivalent tokens on the API sits at roughly 3.7 million tokens a month. Below that the API is cheaper. Above it the subscription wins, and a heavy Claude Code user clears 3.7 million tokens in days, not months. A developer paying $200 for Max 20x is doing work that cost Uber’s engineers $500 to $2,000 on the meter. Same tool, same models, a quarter to a tenth of the price.
The irony is that Anthropic offers both deals at once. It will happily sell an individual flat-rate Claude Code for $200 a month while moving a 5,000-seat enterprise onto per-token billing, because at enterprise scale the flat buffet loses the lab money and at individual scale it is a loss leader worth running. The subscription is not generous by accident. It is the cheapest customer-acquisition channel they have.
Which means the advice writes itself. Use the subscription, and use it hard. The value is in the heavy use, not the light use. If you are dabbling for an hour a week you are overpaying at $20; if you are running agents, drafting, coding and analysing every day, the flat fee is the best deal in technology right now, and it is sitting unredeemed in millions of accounts.
What this means for your bill
The lesson of the last twelve months is not that models got cheaper, and it is not that they got dearer. It is that the sticker stopped telling you anything, and the market quietly split in two.
If you buy tokens, the cost that matters is dollars per finished job, set by two dials the vendors would rather you did not watch together: the per-token rate and the tokens burned reaching the answer. Route accordingly. Send the routine bulk to Gemini 3.5 Flash, and hold Opus 4.8 or GPT-5.5 in reserve for the hard agentic and coding work that justifies the premium.
If you are an individual or a small business, you are on the right side of the split, for now. The flat subscription shields you from the exact token explosion that is detonating enterprise budgets. Ride it, lean on the bundled agentic tools, and get your value out before the buffet closes, because the enterprise meter shows you exactly where this is heading.
Frontier AI did get more expensive this year. It just decided to send the bill to someone else.
Sources:
- Gemini 3.5 Flash - OpenRouter pricing
- GPT-5.5 price increase: what it actually costs - OpenRouter
- Google’s Gemini 3.5 Flash makes newer models significantly pricier - The Decoder
- Artificial Analysis Intelligence Index
- Anthropic ejects bundled tokens from enterprise seat deal - The Register
- Uber caps staff use of AI coding tools after blowing its budget - LA Times
- Claude Code pricing in 2026: Pro vs Max vs API costs explained - Tygart Media