Cost Creep 2026: Gemini Flash Gets Worse While GPT-5.x and Claude Mostly Hold the Line

Cost creep is being used too loosely. A vendor raising list prices is not enough. The only definition that matters is whether the cheapest way to reach a useful quality level got worse. If the answer is yes, that is real cost creep. If the old value point still exists through lower reasoning, a mini variant, or a cheaper sibling, that is not cost creep. That is just a larger menu.

That distinction matters because model families are no longer one model and one price. They are menus with effort controls, mini and nano variants, lite variants, and premium modes layered on top. Looking only at the newest flagship or only at per-token sheet pricing will give you the wrong answer. The right question is much narrower and much more useful: what is the cheapest configuration today that hits a target quality band, and did that point get pulled upward over time.

Under that standard, Google Gemini is the strongest case of real cost creep in this group. OpenAI mostly is not. Anthropic mostly avoided it in Sonnet, shows a partial warning sign in Haiku, and did the opposite in Opus.

Gemini Flash and Flash-Lite are the clearest cost-creep case

Google is where the low end got dragged upward in a way that is hard to excuse.

The cheap Gemini lane used to be cheap in a very literal sense. Flash-Lite pricing moved from Gemini 1.5 Flash-8B at $0.0375 input and $0.15 output, to Gemini 2.0 Flash-Lite at $0.075 and $0.30, to Gemini 2.5 Flash-Lite at $0.10 and $0.40, and then to Gemini 3.1 Flash-Lite Preview at $0.25 and $1.50. The regular Flash lane also moved up hard, from Gemini 1.5 Flash at $0.075 and $0.30 to Gemini 2.5 Flash at $0.30 and $2.50, then Gemini 3 Flash Preview at $0.50 and $3.00.

That by itself already weakens the value story, but list prices alone are not enough. The worse part is that the benchmark economics also look worse. The newer low-cost Gemini models appear more token-hungry in practice. So you are not just paying a higher price sheet. You are often paying for more tokens to get through the same sort of task.

That is why Gemini 3.1 Flash-Lite is such a bad signal. In the benchmark snapshots behind this analysis, Gemini 3 Flash gets slightly better performance for materially less money. That is the kind of comparison that matters. If the model with Lite in the name is no longer the clean low-cost value point inside its own family, then the family frontier got distorted.

The point is not that Google added a premium option. Vendors do that all the time. The point is that the cheap useful point got worse, and one of the newer Lite points does not even hold its role cleanly anymore. That is real cost creep.

I wrote about the narrower Gemini side of this already here: Gemini 3.1 Flash-Lite: Cost, Speed, and Intelligence. This post is the wider family comparison.

GPT-5.x got pricier at the top, but that is mostly not true cost creep

OpenAI raised list prices on the larger GPT-5.x models. GPT-5 and GPT-5.1 sat at $1.25 input and $10 output, GPT-5.2 moved to $1.75 and $14, and GPT-5.4 moved to $2.50 and $15. If you only stare at the flagship sheet, that looks like a clean creep story.

But it breaks once you compare the family the right way. OpenAI kept the low end alive with GPT-5 mini at $0.25 and $2 and GPT-5 nano at $0.05 and $0.40. So the cheapest useful entry point in the family did not disappear. That alone matters a lot.

The second reason is reasoning control. OpenAI gives you a full effort knob, and the cheaper settings are not hidden. If medium or low effort gets you to the same practical quality band, then the expensive high-effort run is optional. It is a premium mode, not a forced replacement.

The benchmark frontier in the topic reflects that. Around the roughly 39 score band, GPT-5 mini at medium effort lands around 39 for about $62, which is far better than some older or larger nearby points. Around the mid band, GPT-5.2 medium beats older high-effort points on cost. Higher up the curve, GPT-5.3 Codex xhigh also looks like a strong efficiency point relative to neighboring models. So while the top end got more expensive, several of the cheapest ways to hit practical quality bands did not get worse and in some cases improved.

There is still a local warning sign in one plain non-reasoning flagship slice where costs rose faster than token efficiency improved. I would not hide that. But that is a local issue. It is not a family-wide upward pull of the curve.

This matches a broader point I have made before. The best part of a release is often the cheaper or more efficient mode, not the most expensive one. That is the same logic behind GPT-5.4 Fast Mode Is the Best Part of the Release.

Claude Sonnet is the clean non-creep case

Sonnet is the easiest family to score. Anthropic kept Sonnet pricing flat across versions at $3 input and $15 output. That means there is no sheet-price creep in the line.

The family structure also helps. Anthropic supports turning thinking off or using lower effort. For Sonnet, that means a newer version can preserve or improve the old practical economics instead of forcing everybody into a heavier mode.

The benchmark points support that read. Claude 3.7 Sonnet sits around 31 for $437. Claude 4 Sonnet gets around 37 for $473. Claude 4.5 Sonnet is roughly 37 for $685. Claude Sonnet 4.6 in non-reasoning or low effort looks closer to 42.5 for $554. That is not a story where the same useful quality band got more expensive. If anything, Sonnet 4.6 looks like a better point on the curve than 4.5.

So Sonnet is not a cost-creep story. It looks like a vendor keeping the price sheet steady while giving users enough control to preserve older economics.

Claude Haiku shows partial cost creep, but it is weaker than Gemini

Haiku is the mixed case. On raw sheet pricing, yes, the jump is obvious. Claude 3 Haiku was $0.25 input and $1.25 output. Claude 3.5 Haiku moved to $0.80 and $4. Claude 4.5 Haiku reached $1 and $5.

But the move from Haiku 3 to 3.5 was not a clean creep case because the benchmark economics only worsened slightly while quality moved up a lot and token usage dropped. Paying somewhat more for a much better model is not what I would call true cost creep.

The warning sign appears more clearly in the 3.5 to 4.5 jump. Claude 3.5 Haiku was around 19 for $47. Claude 4.5 Haiku non-reasoning lands around 31 for $205, and the higher-effort point climbs much further. So the real run cost did jump hard.

Even there, I would still stop short of calling it Gemini-style creep. Claude 4.5 Haiku is not just the same point made more expensive. It reaches a higher quality band and starts occupying a stronger mid-tier role. The stronger creep argument would land if Anthropic removed the cheaper Haiku points and forced everyone onto 4.5 economics. As long as the older cheaper point still exists, this is better described as partial creep and family repositioning, not a full collapse of the low end.

Claude Opus is the reverse case

Opus is easy to classify. Anthropic cut pricing hard, from older Opus pricing around $15 input and $75 output to newer Opus pricing around $5 and $25 in the topic. That is the opposite of cost creep. It is a reminder that family economics can move downward too, not only upward.

The right ranking

If I rank the families by real cost creep, Gemini Flash and Flash-Lite are the strongest case. Claude Haiku is partial and mixed. GPT-5.x and Claude Sonnet are mostly not cost creep. Claude Opus is reverse cost creep.

The main reason this topic matters is that a lot of AI pricing discussion is still stuck at the sheet-price level. That misses the thing buyers care about. Nobody buys a model because the input price column looks nice in isolation. They buy the cheapest setup that can do the work well enough. That is the comparison that decides whether costs are creeping or whether a vendor simply added more expensive options on top.

If you only keep one line from this whole piece, keep this one: the right question is not whether the newest model costs more. The right question is whether the cheapest way to get useful quality got worse.

Under that standard, Google is the clearest offender here. OpenAI mostly avoided it by keeping low-end options and giving users effort control. Anthropic kept Sonnet stable, made Haiku more complicated but not fully broken, and cut Opus enough to count as the opposite story.

If you want the wider release context around where these models fit in the current model churn, I keep a running index here: Every AI Model Released in February 2026.

Links

They're clicky!

Follow me on X Visit Ironwood AI →

Adam Holter

Founder of Ironwood AI. Writing about AI stuff!