Two digital displays side by side against black background. Left screen shows 0.8 percent in green numbers. Right screen shows 14.3 percent in red numbers. Shot on Canon EOS R5, 50mm f1.2 lens, low key lighting, shallow depth of field.
Created using Ideogram 2.0 Turbo with the prompt, "Two digital displays side by side against black background. Left screen shows 0.8 percent in green numbers. Right screen shows 14.3 percent in red numbers. Shot on Canon EOS R5, 50mm f1.2 lens, low key lighting, shallow depth of field."

o3-Mini Beats DeepSeek R1 on Hallucination Tests by 18x

o3-mini performs differently from other OpenAI models. It repeats user names in every response, creating an odd but consistent personality quirk. But that’s not why I’m writing this.

DeepSeek R1 launched with heavily subsidized API pricing, making it temporarily cheaper than o3-mini. Once their planned price increase hits though, both will cost about the same.

The real story is the hallucination rates. On standardized benchmarks, o3-mini with high reasoning effort scores a hallucination rate of just 0.8%. DeepSeek R1? A concerning 14.3% – that’s 18 times higher.

I’ve previously covered DeepSeek’s questionable claims about their training costs (see my analysis at https://adam.holter.com/deepseek-claims-5-6m-training-cost-while-hiding-billions-in-infrastructure/). This hallucination data reinforces my skepticism about their technical claims.

The numbers are clear: if you need factual reliability, o3-mini is the better choice right now. DeepSeek R1’s responses simply can’t be trusted at the same level.
Bottom line – don’t pick models based on temporary pricing. Focus on actual performance metrics. The hallucination rates tell the real story here.