Just a few months ago, our options for viable language models in production were limited. GPT-4 was the go-to for important tasks, while GPT-3.5 handled less critical operations. Open source models and Claude 2 could potentially replace GPT-3.5, but for top performance, GPT-4 was the only real choice.
Then, about three months ago, Claude 3 arrived on the scene. Claude 3 Haiku and Claude 3 Sonnet quickly became viable alternatives for lower-end tasks. You could use Claude 3 Haiku instead of GPT-3.5 at a lower cost and generally better performance. For mid-level tasks between GPT-3.5 and GPT-4, Sonnet became a solid option. Claude 3 Opus offered another choice for top-level tasks, though its higher cost didn’t expand our pool of price-performance options.
Soon after, GPT-4 Omni was released, dramatically reducing the cost of top-tier intelligence. Anyone using GPT-4 in production switched to GPT-4 Omni for much cheaper with comparable results.
The release of Llama 3, particularly the 70 billion parameter version, provided another low-level model option. It was very affordable, and numerous model providers began running it, creating competition for the fastest and cheapest inference. Companies like Groq offered lightning-fast, cost-effective versions, adding another viable API option to the mix.
Claude 3.5 Sonnet then launched, offering slightly better intelligence than GPT-4 Omni at a lower price point, with notably improved writing capabilities. It’s currently my go-to model for top-tier tasks, especially writing.
Mark Zuckerberg’s open-sourcing of Llama 3.1 in 8 billion, 70 billion, and 405 billion parameter versions raised the bar for 70 billion parameter models. This effectively made GPT-3.5 obsolete for serious users.
The introduction of Llama 3.1 8B brought costs down significantly. While GPT-4 API could cost a few cents for just a thousand tokens a few months ago, Llama 3.1 8B now offers a rate of 9 cents per million tokens for both input and output. That’s roughly a 300-fold decrease in cost for less intelligence, but still sufficient for many applications.
OpenAI’s release of GPT-4 Omni Mini further expanded our options. It costs less than a dollar for a million output tokens and just a few cents for a million input tokens, with intelligence far surpassing GPT-3.5. For tasks requiring moderate intelligence without needing top-tier models, GPT-4 Omni Mini has become the popular choice. I still prefer Claude 3 Haiku for its creativity in writing, but for non-writing tasks, GPT-4 Omni Mini is my go-to.
Looking ahead, Anthropic has announced plans to release Claude 3.5 Haiku later this year. It’s expected to offer even higher intelligence than GPT-4 Omni Mini at a slightly higher but still negligible cost. This development may eliminate the need for 3.5 Sonnet for many writing tasks, potentially reducing API costs for certain automations by about 5x.
As competition intensifies, we’re seeing a trend of decreasing prices for increasingly intelligent models. It’s truly a buyer’s market for AI APIs right now.
The improvements aren’t limited to text models. Until recently, DALL-E 3 was the best API available for high-quality images with good prompt coherence. However, Black Forest AI recently released Flux Point One in three versions: Pro, Dev, and Schnell. This offers excellent text quality and prompt coherence with Midjourney-level image quality at just 1/10 the cost of a DALL-E image. I’m currently in the process of switching all my APIs to Flux.
As a user of these AI models, I’m consistently getting access to better intelligence at lower costs. The rapid pace of development in this field is creating an exciting and increasingly accessible landscape for AI applications.