Text: OpenAI's is toooatally better. Deepmind should get gud lol Background: gold metals covered in mathematical symbols

OpenAI vs DeepMind: The Great AI Math Olympics Cheating Scandal of 2025

Both OpenAI and Google DeepMind just achieved something remarkable: their AI models scored 35/42 on the 2025 International Mathematical Olympiad, solving five of six problems and earning gold medal performance. But instead of celebrating this milestone in AI reasoning, we got a playground fight over who cheated and who played by the rules.

DeepMind waited until July 21 to announce their Gemini Deep Think results, claiming they honored the IMO Board’s request to withhold results until official grading and student recognition were complete. OpenAI posted their achievement two days earlier using internal grading, prompting critics to say they jumped the gun on the agreed-upon blackout period.

The drama escalated when Demis Hassabis threw shade at OpenAI on X, stressing DeepMind’s respect for the rules and timing. OpenAI researchers fired back, mocking Gemini’s extensive fine-tuning: taking your tutor’s cheat sheet into an exam. The DeepMind blog admits their model was fine-tuned with a curated corpus of high-quality math proofs plus IMO-specific hints and tips, fueling the fairness debate.

DeepMind GeminiDeep ThinkAnnounced: July 21Method: Fine-tuned + HintsOpenAI ModelUndisclosedAnnounced: July 19Method: General reasoningGold35/42Both Models Tied

Both AI models achieved identical gold medal performance, but took very different approaches and announcement strategies.

The Numbers Tell the Story: A New Era for AI in Math

The 2025 IMO results are historic for AI. Both models solved the exact same five problems out of six, achieving a score that would earn any human participant a gold medal. The International Mathematical Olympiad represents the pinnacle of high school mathematics competition, with problems designed to challenge the brightest mathematical minds globally. This isn’t just about getting answers; it’s about sophisticated problem-solving that requires deep understanding and creative application of mathematical principles.

Google DeepMind’s Gemini “Deep Think” utilized what DeepMind calls a novel “Deep Think” reasoning framework. This framework incorporates parallel thinking and reinforcement learning techniques, allowing the model to process problems and generate solutions in ways that mimic human-like deliberation. Notably, the model worked entirely in natural language within the competition’s strict 4.5-hour time frame, a significant feat given the complexity of the problems. This natural language interaction means the AI can understand and respond to mathematical problems presented in a human-readable format, bridging the gap between abstract computation and intuitive comprehension.

OpenAI’s approach was notably different. Their model wasn’t specifically optimized for math problems, unlike Gemini, yet achieved identical results. This suggests a more generalized reasoning capability, where the AI can apply its fundamental understanding to diverse domains without explicit, narrow training. The technical achievement here shouldn’t be understated. These results represent a significant milestone in AI reasoning capabilities, demonstrating that both companies have made substantial progress in complex mathematical problem-solving. This isn’t just about a single competition; it’s about the broader implications for AI’s ability to tackle previously human-exclusive intellectual tasks.

The Training Controversy: Fair Competition or Academic Cheating?

Here’s where things get messy. DeepMind openly disclosed that Gemini “Deep Think” was fine-tuned with a curated corpus of high-quality math proofs and received IMO-specific “hints and tips.” From an academic perspective, this is like studying past exams and getting tutoring specifically for the test you’re about to take. It’s a highly optimized, targeted approach designed to maximize performance on a specific challenge. While effective, it raises questions about the “purity” of the achievement in a general intelligence context.

OpenAI’s approach was different. Their model wasn’t specifically trained for mathematical competitions, making their achievement arguably more impressive from a general intelligence standpoint. The idea is that if an AI can excel at a specialized task without specialized training, it indicates a more robust and adaptable form of intelligence. However, OpenAI has revealed almost nothing about their training recipe, which makes direct comparisons difficult. This lack of transparency is a recurring theme in the proprietary AI space, often hindering independent verification and broader scientific understanding.

The fairness debate boils down to this: Is it more impressive to build a general reasoning system that happens to excel at math, or to specifically train for mathematical problems and achieve the same result? Both approaches have merit, but they’re solving different problems. DeepMind’s method is a testament to the power of focused optimization, while OpenAI’s (if their claims hold true) highlights the potential of broad, foundational models. The question isn’t whether one is “better” than the other in an absolute sense, but what each achievement tells us about the path toward advanced AI.

Timing Drama: Who Broke the Rules?

The timing controversy adds another layer to this story. DeepMind claims they honored an IMO Board request to withhold results until official grading and student recognition were complete. This is a common practice in academic competitions to preserve the integrity of the event and ensure human participants receive their due recognition before AI models steal the spotlight. OpenAI announced two days earlier using internal grading, which DeepMind co-founder Demis Hassabis characterized as disrespectful to the competition process.

This isn’t just about courtesy – it’s about establishing norms for how AI companies should interact with academic institutions and competitions. If the IMO Board requested a blackout period to protect the integrity of the human competition, jumping the gun does seem problematic, regardless of technical achievement. It sets a precedent that could undermine future collaborations between AI labs and traditional academic bodies. It also reflects a competitive drive that sometimes overshadows shared scientific progress.

The social media exchanges between Hassabis and OpenAI researchers highlight the competitive tension in the AI field. When Hassabis stressed DeepMind’s respect for rules and timing, OpenAI researchers responded by questioning whether Gemini’s extensive fine-tuning constituted fair play. Both sides have legitimate points, but the public nature of the dispute feels petty given the magnitude of what both teams achieved. It’s the kind of public squabble that detracts from the true significance of the breakthroughs.

What This Means for AI Progress: Beyond the Scoreboard

Beyond the drama, these results signal that AI has reached a new level of mathematical reasoning capability. The IMO problems require creative thinking, pattern recognition, and complex logical reasoning – skills that go far beyond memorizing formulas or following algorithms. They demand a form of intelligence that was long considered uniquely human.

The fact that two different approaches achieved identical results suggests we’re seeing convergence toward human-level performance in mathematical problem-solving. Whether through specialized training like Gemini’s approach or general reasoning like OpenAI’s method, AI systems can now tackle problems that challenge even exceptional human mathematicians. This isn’t just about solving equations; it’s about understanding the underlying principles and applying them creatively to novel situations.

This has implications beyond mathematics. The reasoning capabilities demonstrated here could translate to scientific research, engineering challenges, and other domains that require sophisticated analytical thinking. Imagine AI assisting in drug discovery, materials science, or complex system design. The ability to reason mathematically is a foundational skill that unlocks a vast array of intellectual pursuits. This pushes us closer to truly intelligent automation across various fields.

The Transparency Problem: Why Openness Builds Trust

One frustrating aspect of this story is the lack of transparency, particularly from OpenAI. While DeepMind disclosed their reinforcement learning tweaks and fine-tuning methodology, OpenAI revealed almost nothing about their training process. This makes meaningful comparison nearly impossible. If you can’t see the recipe, you can’t fully appreciate the dish, or replicate it.

The AI research community benefits from openness about methodologies, especially when making claims about general intelligence versus specialized performance. OpenAI’s secretive approach, while perhaps strategically sound from a business perspective, hampers scientific understanding of these achievements. It creates an environment of speculation rather than verifiable progress. This isn’t just about academic curiosity; it’s about building trust and allowing others to build upon these advancements.

DeepMind’s willingness to admit their model used curated training data and hints actually strengthens their credibility, even if it raises questions about the fairness of their approach. Transparency matters more than perfect methodology when advancing the field. It allows for critical review, replication, and ultimately, faster collective progress. Without it, every claim becomes just that: a claim, rather than a verifiable scientific result.

The Bigger Picture: AI Competition Ethics and Future Norms

This controversy highlights broader questions about how AI companies should compete in academic and research contexts. Should there be standard protocols for announcing research results? How much coordination with academic institutions is appropriate? What constitutes fair comparison when training methodologies differ so dramatically?

As AI capabilities approach and potentially exceed human performance in more domains, these questions become increasingly important. The mathematical olympiad is just one benchmark, but it’s a prestigious one that carries significant public attention and credibility. The way companies handle these achievements sets precedents for future breakthroughs. Respectful engagement with academic institutions, transparent disclosure of methods, and collaborative rather than competitive framing could benefit everyone involved. This is about shaping the future of AI research, not just winning a single contest.

For example, if AI models are to participate in and potentially reshape scientific discovery, clear ethical guidelines and protocols are essential. Without them, we risk a chaotic landscape where competitive advantage trumps scientific integrity. This means establishing shared rules of engagement, perhaps even an independent body to oversee AI participation in such benchmarks.

Looking Forward: What Comes Next?

Both achievements point toward a future where AI systems routinely match or exceed human expert performance in specialized domains. The question isn’t whether AI will surpass human mathematicians – it’s how quickly this capability extends to other fields and how we integrate these tools productively. This includes areas like scientific research, where AI could act as a powerful assistant for hypothesis generation and data analysis.

The training approach differences suggest multiple paths toward advanced AI capabilities. Specialized fine-tuning like DeepMind’s approach might be optimal for specific domains, offering highly performant, tailored solutions. General reasoning capabilities like OpenAI claims to have developed could prove more versatile across applications, allowing for broader applicability and less need for domain-specific retraining. We are likely to see both approaches continue to advance, each with its own strengths and use cases.

For the mathematical community, these results represent both an opportunity and a challenge. AI tools could accelerate mathematical research and education, automate tedious proofs, and even suggest new theorems. But they also raise questions about the future role of human mathematicians and the nature of mathematical discovery itself. Will human mathematicians become curators of AI-generated proofs, or will they be freed to pursue even more abstract and creative problems?

This also impacts the job market. Just as AI is already impacting roles like non-expert copywriters and graphic designers, the ability of AI to perform complex mathematical reasoning could change the landscape for certain analytical roles. The demand will shift towards those who can work with AI, refine its outputs, and handle the truly novel problems that AI can’t yet touch.

My Take: Both Sides Need to Grow Up

The technical achievements here are genuinely impressive and deserve celebration. Both teams solved an extraordinarily difficult problem and demonstrated meaningful progress toward human-level reasoning in mathematics.

The public drama, however, is counterproductive. OpenAI jumping the gun on announcements shows poor form if there was indeed an agreed-upon protocol. But DeepMind’s public shaming on social media isn’t much better. These are scientific achievements that advance human knowledge – they should be discussed with appropriate gravity.

The transparency issue is more serious. OpenAI’s secretive approach makes it impossible to properly evaluate their claims about general reasoning versus specialized training. If you want credit for building a general intelligence system rather than a specialized math solver, you need to provide evidence. It’s not about hiding trade secrets; it’s about contributing to the collective understanding of AI’s capabilities.

Ultimately, both approaches have value. Specialized AI systems trained for specific domains will likely achieve higher performance in those areas. General reasoning systems offer broader applicability and potentially greater insights into intelligence itself. We need both paths of development. The competitive tension is understandable, but it shouldn’t overshadow the shared goal of advancing AI.

The mathematical olympiad results represent a milestone in AI development that deserves better than a playground fight over timing and training methods. Both teams achieved something remarkable – they should act like it. The real competition should be in advancing the technology, not in who can throw the most shade on social media.