This post is part three of what has become something of a saga on my blog, Claude vs. ChatGPT. Since the last installment, Anthropic released Claude 3.5 Sonnet, which is state-of-the-art in many aspects. It is widely adopted for coding and excels in writing. Most of my projects utilize Claude 3.5 Sonnet when I might have otherwise employed GPT-4 Omni.
However, the main drawback of Claude 3.5 Sonnet is its API’s reliability. It frequently produces overload errors during use, while the GPT-4 API remains considerably more dependable. This leads me to primarily rely on Claude, although I always set up GPT-4 Omni as a fallback option. An additional advantage of GPT-4 is that it has been recently optimized. When Claude 3.5 Sonnet launched, it had a slight edge in input token pricing over GPT-4 Omni, but just a couple of days ago, OpenAI rolled out a new snapshot, significantly reducing its costs and undercutting Anthropic.
Beyond pricing strategies, the competition between OpenAI and Anthropic has intensified as small language models gain traction due to their cost-effectiveness. The intelligence market has become quite affordable. For instance, when Claude 3 was introduced, we also saw the release of Claude 3 Haiku, a smaller yet cost-effective model that many developers adopted. Recently, OpenAI launched GPT-4o mini, which is not only cheaper than Haiku but also demonstrates higher intelligence, although Haiku still outshines it in writing capabilities.
GPT-4o mini has become a significant part of my workflow, but while these models compete, Google has also been developing its offerings. Within the timeframe of Haiku’s and GPT-4o mini’s releases, Google introduced Gemini 1.5 Flash, which is positioned below Haiku in pricing and performed well on various benchmarks. Despite its benchmark scores, Gemini 1.5 Flash has underperformed in my practical tests; its writing abilities are notably lacking. However, Google has recently slashed its price by 80%, making it one of the most economical models available, useful for tasks such as sentiment analysis and summarization, and priced approximately half that of the already low-cost GPT-4o mini. Additionally, Gemini 1.5 Flash boasts higher intelligence compared to LLaMA 3.1 8B.
Now, circling back to state-of-the-art models, the LLaMA 3.1 series has launched its major 405B model, which rivals GPT-4o and Claude 3.5 Sonnet at a substantially lower cost. It shares the same input costs as Sonnet and is slightly higher than GPT-4o, yet the output cost is only $3 per million tokens, compared to $15 for Sonnet and $10 for the latest GPT-4 Omni snapshot. Nevertheless, in my tests, any version of Claude consistently outperforms the competition in writing, creativity, and style matching based on provided samples, which is a core function of my content automation dashboard.
This post follows up on Use the Right Tool for the Job: Claude vs. GPT Round Two.