dollar sign $ printed black on a pure white background

Cheap AI Tokens, Expensive Tasks: Why Agentic Workflows Changed Everything

Token prices have fallen off a cliff. Googles Gemini 2.0 Flash hit $0.10 per million input tokens in February 2025  a tenth of a cent per thousand tokens. Thats 300x cheaper than GPT-4s original $30 pricing from early 2023. Anyone looking at these numbers might think AI has become free.

Theyd be wrong.

While unit costs cratered, something else happened that flipped the economics entirely: the types of tasks we can now accomplish have become dramatically more complex and token-intensive. Its not that the same old tasks cost more  its that were now doing tasks that were impossible before. Models like Kimi K2 and Claude 4 dont just answer prompts anymore  they can orchestrate multi-step workflows that would have been science fiction in 2023. The tasks that users are gravitating toward burn through orders of magnitude more tokens than simple question-answering ever did.

The result? Your AI bill is higher than ever, even with tokens that cost nearly nothing.

The Great Token Price Collapse

The price war in AI has been brutal and swift. Heres how frontier intelligence pricing crashed:

Token prices fell 300x in two years, but users shifted to more complex tasks.

GPT-4 launched at $30 per million input tokens in early 2023. Industry observers thought that was expensive but reasonable for frontier intelligence. Then the price wars began.

OpenAIs GPT-4 Turbo dropped prices to $10 in November 2023. GPT-4o pushed them to around $2 by May 2024. Both maintained roughly the same quality as the original GPT-4, just cheaper and faster.

Google delivered the knockout punch with Gemini 2.0 Flash at $0.10 per million input tokens. Thats not a typo  a tenth of a cent per thousand tokens. Nobody thought near-frontier intelligence could get that cheap.

But Then Agentic Workflows Enabled New Possibilities

Just as token prices hit rock bottom, AI capabilities exploded beyond simple question-answering. Starting in late 2024, models gained the ability to orchestrate complex, multi-step workflows that were previously impossible.

These agentic workflows opened up entirely new categories of tasks. Instead of asking write me a function, users can now say build me a complete application with error handling, testing, and deployment scripts. The AI doesnt just generate code  it plans the project architecture, researches best practices, writes multiple components, tests them, debugs issues, and iterates based on results.

Heres what a modern agentic coding workflow might involve:

  • Initial project planning and architecture decisions
  • Research calls to understand APIs and frameworks
  • Multi-file code generation with proper structure
  • Automated testing and debugging cycles
  • Performance optimization recommendations
  • Documentation generation
  • Deployment configuration

Each capability was either impossible or required manual intervention with earlier AI models. Now users routinely tackle tasks that would have taken days or weeks of back-and-forth with traditional AI assistants. But these sophisticated workflows consume 50,000-200,000 tokens per session  orders of magnitude more than simple prompts.

Users shifted from simple Q&A to complex project-building workflows.

The Hidden Cost of Capability Expansion

Most users never felt this pain initially because venture capital subsidized the bills. Platforms like Cursor marketed unlimited GPT-4 at loss-leader prices. Users got conditioned to expect frontier-quality AI for basically free.

That party is over.

As VC funding tightened, platforms started imposing usage caps and price hikes. The backlash was immediate and predictable. Users who thought GPT-4 was essentially free discovered their monthly bills jumping from $20 to $200 or more  not because the same tasks got more expensive, but because they were now tackling far more ambitious projects.

The math reveals the shift in user behavior. Even at Gemini 2.0 Flash pricing of $0.10 per million input tokens, a complex agentic workflow consuming 100,000 tokens costs $0.01. That sounds negligible until you realize users are now running dozens of these sophisticated workflows daily. A developer using an AI coding assistant for full project development might trigger 20-50 complex sessions daily. Suddenly youre looking at $10-25 per day just in token costs, before output tokens and API markup.

This is precisely why Anthropic is introducing new weekly rate limits for its Claude Pro and Max tiers starting August 28, 2025. It targets excessive usage by a small subset of power users, minimizing impact on most customers. Some users were consuming tens of thousands of dollars worth monthly on the $200 tier, leading to unsustainable costs for Anthropic and unexpected bills for users. For those who hit these limits on the Max plan, Anthropic will allow purchasing additional usage at standard API rates. This perfectly illustrates the point: cheap AI tokens do not mean cheap AI tasks when the scope of work expands this dramatically.

The Near-Frontier Sweet Spot

Heres where it gets interesting. While frontier models like o3 and the latest GPT-4 variants command premium pricing, a new tier of near-frontier models offers compelling alternatives for these expanded workflows.

Gemini 2.5 Flash  the promoted, slightly pricier successor to the ultra-cheap 2.0 Flash  prices at around $0.30 per million input tokens and $2.50 for output. Kimi K2 comes in at $0.60 input and $2.50 output, benchmarking better than any open-source or mid-tier proprietary model from 2024.

Both crush last years performance while offering 10-100x savings over frontier models for these token-intensive workflows. The performance gap between these models and absolute frontier is often just 5-10 percentage points  negligible for most real-world projects.

Model Tier Input Cost (per million tokens) Output Cost (per million tokens) Best For
Ultra-Cheap $0.10 $0.40 Gemini 2.0 Flash  High volume, simple tasks
Near-Frontier $0.30-0.60 $2.50 Gemini 2.5 Flash, Kimi K2  Complex project workflows
Frontier $2+ $8+ Gemini 2.5 Pro, o3  Mission-critical complex reasoning

Near-frontier models offer the best price/performance for complex workflows.

Why This Matters for Developers

This capability expansion has real implications for how we build and deploy AI applications.

First, users are gravitating toward increasingly sophisticated tasks now that theyre possible. Simple code generation is giving way to full project development, basic content creation is being replaced by comprehensive content strategies, and one-off analyses are becoming ongoing research projects.

Second, the tier system creates new optimization opportunities. Route simple tasks to ultra-cheap models, use near-frontier for complex project work, and reserve frontier models for tasks where that extra 5-10% performance justifies the 10x cost premium.

Third, workflow design becomes critical. When users can accomplish tasks that were previously impossible, the temptation is to go overboard with complexity. Smart workflow design helps users achieve their ambitious goals without unnecessary token waste.

The Context Window Reality

These expanded workflows dont just use more tokens  they manage more complex state across longer sessions. A multi-day project might involve dozens of files, multiple iterations, and extensive conversation history.

Model providers have started addressing this with caching discounts. OpenAI offers 50% off cached tokens, while some providers go as high as 75% off. But caching only helps if your workflows maintain context efficiently across sessions.

The bigger opportunity is teaching users to break large projects into manageable chunks that dont require massive context windows. Instead of trying to build an entire application in one mega-session, smart users learn to work iteratively with reasonable context boundaries.

Production Optimization Strategies

Based on real production deployments handling these expanded workflows, here are the strategies that work:

  • Progressive Complexity: Start projects with cheaper models for initial planning and exploration, then move to more expensive models only when complexity demands it.
  • Session Management: Break large projects into logical sessions rather than maintaining massive context across days or weeks of development.
  • Intelligent Routing: Use different models for different project phases. Architecture planning might need frontier intelligence, but code generation often works fine with near-frontier models.
  • Scope Definition: Help users understand whats achievable in different workflow types, preventing scope creep that burns through tokens unnecessarily.
  • Quality Gates: Define clear criteria for when more expensive models add meaningful value versus when efficient models deliver adequate results.

These strategies align with my insights on Qwen3-Coder and ChatGPT agents: while AI can crush many tasks, smart workflow is the true differentiator for cost and performance.

The Market Reality Check

The AI pricing story is really about capability expansion. Early adopters discovered they could accomplish tasks that were impossible before, leading to dramatically increased usage even as unit costs plummeted.

This isnt about the same tasks getting more expensive  its about users choosing to tackle far more ambitious projects now that AI can handle them. The unit cost of intelligence keeps falling, but the scope of what users want to accomplish keeps expanding faster.

This creates natural market segmentation. Users who stick to simple, traditional AI tasks benefit from near-free pricing. Users who want to push the boundaries of whats possible with multi-step workflows face higher bills, but theyre also accomplishing things that would have been impossible at any price just two years ago.

What Comes Next

Three trends will define AI pricing in 2025:

  • Workflow Specialization: Instead of general-purpose frontier models, well see more models optimized for specific types of complex workflows. Coding-specific models like Qwen3 Coder point in this direction.
  • Session Intelligence: Providers will compete on maintaining context and project state efficiently across long, complex workflows without token waste.
  • Capability-Based Pricing: Well see more pricing models that charge differently for different types of AI capabilities  simple generation, complex reasoning, tool usage, and project orchestration might all have different rates.

The bottom line is simple: tokens are cheap, but ambitious workflows consume lots of tokens. The users who win are those who match their project complexity to the right model tier and learn to work efficiently within the new paradigms that AI has made possible.

Were not paying more for the same tasks  were choosing to tackle far more sophisticated projects because we finally can.