An illustration of a robot with the text "AGI (depending on your definition) is Coming!" on a bright yellow background. The robot has a human-like face, and it is holding a sign with the text. The robot is standing on a blue platform. There are stars and a planet in the background.
Created with Ideogram using the prompt: 'An illustration of a robot with the text "AGI (depending on your definition) is Coming!" on a bright yellow background. The robot has a human-like face, and it is holding a sign with the text. The robot is standing on a blue platform. There are stars and a planet in the background.'

OpenAI o3 Hits 88% on Alan’s AGI Countdown: Here’s Why That Matters

OpenAI’s o3 model just hit 88% on Alan Thompson’s AGI countdown, up from 84%. The jump comes from o3 crushing several key benchmarks that measure how close we are to artificial general intelligence.

First, let’s look at the numbers. On the GPQA Diamond benchmark, o3 scored 87.7%, compared to the previous model’s 78.3%. For context, this test measures an AI’s ability to reason through complex problems – not just memorize answers.

But the real shocker came from the 2024 American Mathematical Olympiad. o3 got 96.7%, missing just one question on one of the hardest math tests in the world. It also dominated coding challenges, reaching the 99.8th percentile on Codeforces with a rating of 2727.

The most impressive result? FrontierMath. Fields Medalist Timothy Gowers said these problems were so hard that getting even one right would be beyond current AI capabilities. o3 solved 63 out of 250 questions, scoring 25.2%. The previous model only managed 2%.

I agree with Alan’s assessment. The test results show that if you can create a clear benchmark for a task, AI will eventually master it. There’s no upper limit – as Sam Altman said, “There is no wall.”

For more technical details on o3’s capabilities, check out my analysis of its performance on the ARC-AGI benchmark: https://adam.holter.com/openai-o3-scores-75-7-on-arc-agi-a-technical-analysis/

The rapid progress in AI reasoning and problem-solving suggests we’re moving faster toward AGI than many expected. While some debate the timeline, the benchmark results speak for themselves. AI systems are getting better at tasks we thought would take decades to crack.

What’s next? I’ll be watching how o3 performs on even harder problems and tracking its impact on real-world applications. The countdown to AGI just got more interesting.