OpenAI Disproves the Planar Unit Distance Conjecture

OpenAI’s general purpose reasoning model has disproved a central conjecture on the planar unit distance problem. The model surfaced an infinite family of point configurations that deliver a clean polynomial improvement over the square grid constructions mathematicians relied on for decades. This result stands out because the model operated without specialized training or scaffolding aimed at this problem or at mathematics in general. It simply worked through a set of hard Erdős questions and produced a proof that held up under external review.

The question at the center of this is straightforward. Given n points in the plane what is the maximum number of pairs that can lie at exactly distance one. Erdős asked it in 1946 and it became one of the most discussed problems in combinatorial geometry. The function u of n tracks that maximum. Points on a line give you n minus one unit distances. A square grid reaches about 2n. The prior best constructions refined the grid to achieve n to the power of one plus C over log log n. The extra exponent shrinks toward zero so the growth remained close to linear. For a long time the field operated on the belief that u of n could not rise much above n to the one plus o of one.

The new constructions break that pattern. For infinitely many n they produce at least n to the one plus delta unit distance pairs where delta sits at a fixed positive value. A follow up refinement sets delta at 0.014. That gap matters. It shows the square grid intuition missed an entire class of better structures. The upper bound still sits at order n to the four thirds from work in the 1980s that has seen only small refinements since. The distance between the improved lower bound and that upper bound leaves room for more surprises but the conjecture that stood for nearly eighty years no longer holds.

The argument pulls tools from algebraic number theory. Erdős own lower bound already drew on Gaussian integers which extend the ordinary integers and support unique factorization. The new work replaces those with more complex number fields that possess richer symmetry groups. These fields generate far more unit length differences. The proof invokes infinite class field towers and Golod Shafarevich theory to establish that the required fields exist. Those ideas were standard in number theory circles yet no one had carried them over to this geometric question. The model identified the bridge and assembled the full argument end to end.

External mathematicians checked the proof line by line. They also produced a companion paper that supplies broader context and extracts lessons the raw proof does not spell out. Tim Gowers called the outcome a milestone in AI mathematics. Noga Alon noted that every combinatorial geometer has spent time on this question and that the resolution through sophisticated number theoretic tools came as a genuine surprise. Arul Shankar and Jacob Tsimerman highlighted the originality of the ideas and the clean way they were executed. Thomas Bloom observed that the result demonstrates number theoretic constructions have considerably more to say about discrete geometry than previously suspected. The depth of number theory required suggests algebraic number theorists may now revisit other open questions in the area looking for similar connections.

I have maintained that frontier models manage complex mathematical reasoning at a level many critics understate. Papers that declare models cannot do math reliably tend to test only smaller budget versions. The systems we actually turn to for difficult work operate differently. They sustain coherence across long chains of reasoning. They surface machinery from distant domains without explicit prompting. The jump from class field towers to point sets in the Euclidean plane qualifies as exactly that kind of move. It feels like insight because it is.

The team also studied how the model’s success rate on this problem scales with test time compute. Additional thinking time produces clear gains in accuracy. The model explores paths discards dead ends and maintains a line of argument that survives end to end verification. Those traits matter. The same capacity that holds a hundred step number theoretic argument together can support multistep experimental design in biology or materials analysis in physics. The OpenAI post itself draws this connection. Capabilities that prove reliable on precise mathematical questions transfer to domains where ideas must cohere across messy data and incomplete information.

What I find practical is how this shifts my own workflow on research adjacent tasks. When I encounter a new technical domain I now routinely prompt the model to surface analogies or formal machinery that might sit several fields away from the obvious literature. The model can propose connections I might reach only after extended reading. I still verify every step against primary sources and I still make the final judgments but the initial search phase compresses. That compression compounds when the goal is to evaluate many candidate directions quickly. The unit distance result reinforces the habit. Ask for the unexpected bridge. Check what arrives.

This outcome also clarifies the division of labor between models and people. The model generated the proof. Experts validated it piece by piece and wrote the companion piece that converts a technical advance into a pointer toward new research programs. Their judgment about which implications carry weight and which questions to pursue next grows more valuable precisely because the model can now suggest more paths worth checking. The model proposes and verifies. Humans select the problems that matter and shape the broader narrative. That pattern matches what I have seen across other releases. Progress in coherent reasoning does not diminish expertise. It amplifies the return on good questions and sharp interpretation.

The result fits a larger pattern I track. Frontier systems keep posting measurable gains on tasks that require sustained technical thought. Each gain is concrete. A new family of geometric constructions. A verifiable argument that crosses traditional field boundaries. Improved reliability on held out problems when granted time to think. None of it requires treating the model as magic. It looks like the expected output of better coherence at scale. The release cadence moves fast enough that individual benchmarks age quickly yet the direction remains consistent. Models are becoming dependable collaborators on hard problems.

I expect additional examples in mathematics because the domain supplies a clean testbed. Arguments are verifiable. The problems are precise. Success on a question that sits at the center of combinatorial geometry increases the odds of similar jumps elsewhere. Some areas will stay stuck for now. Others may open suddenly once an overlooked technical bridge appears. Models help explore more of those possibilities. The people who understand the terrain deeply enough to recognize a genuine advance become the scarce resource.

The square grid that served as the benchmark for so long now shares space with these number field constructions. That shift alone deserves attention. The fact that a general reasoning model located the path without narrow specialization makes the case worth following closely. The next practical test is whether the same approach yields progress on related geometric questions or whether the technique remains isolated to this instance. Either outcome updates the record in a useful way. The proof has been checked. The conjecture has been replaced. That is what progress looks like.

As I have written in earlier pieces on model capabilities the through line stays the same. Frontier systems continue to improve at the kinds of extended coherent reasoning that separate demonstration projects from work experts treat seriously. The unit distance result supplies one especially clean data point. It shows what becomes possible when a model can hold distant ideas in alignment long enough to produce something new. I will keep testing each new reasoning advance against my own tasks rather than assuming any single system dominates. The useful move is to keep evaluation lightweight so every capability gain can be checked without overhead. The models are getting better at the work that matters. The practical response is to use them accordingly.

Links

They're clicky!

Follow on X →Ironwood →
Adam Holter
Adam Holter

Founder of Ironwood AI. Writing about AI models, agents, and what's actually happening in the space.