Claude Mythos Preview posted 77.80% on SWE-bench Pro. GPT-5.4 is at 57.70%. OpenAI has been signaling that Spud, their next model, closes that gap. The leaked release date is April 16, which is eight days from the date of this writing.
Pretraining wrapped up around March 24, 2026, and Spud is currently in safety evaluation. Polymarket has it at 78% odds of dropping by April 30, and 95%+ by June 30.
The reason this matters now is Anthropic. Before Mythos, the community knew Spud was coming but had no ceiling to measure it against. Now they do, and OpenAI has repeatedly indicated Spud is in that range.
The Benchmark Gap
The SWE-bench Pro leaderboard tells the story of where things stood before Spud. GPT-5.4 sits at 57.70%, which is already well ahead of Claude Opus 4.5 at 45.89%, Claude 4.5 Sonnet at 43.60%, and Gemini 3 Pro Preview at 43.30%. Mythos Preview at 77.80% is a different tier entirely. That is a 20-point gap over the next best vendor-reported number. The expectation is that Spud closes most or all of that gap.
The GPT-5.4 score at 57.70% is being read as a hint about what Spud is targeting. It is vendor-reported on the same leaderboard as Mythos, and the gap between the two is what the community is now focused on. The expectation from multiple sources is that Spud lands close to Mythos territory, which would put it well above everything else currently on that board.
The naming question is genuinely unresolved until benchmarks are confirmed. Sam Altman has called Spud a model that could meaningfully accelerate the economy. Greg Brockman described it as representing two years of research and carrying a big model feel with no incremental framing. Those are not the words you reach for when describing a 5.5. The rule OpenAI appears to be following: a generational jump gets a major version bump, a strong but incremental improvement stays in the 5.x family. If Spud scores in the high 70s on SWE-bench Pro, GPT-6 is the right call. If it lands closer to the low 60s, GPT-5.5 is more likely.
On model naming in general: OpenAI’s naming has been a mess for a while. GPT-5.1 Codex, GPT-5.1 Codex Mini, GPT-5.2, GPT-5.3-Codex, GPT-5.3-Codex-Spark. The fact that the next one might just be called GPT-6 would be a relief. They could honestly let the model name itself and it would probably do a better job.
The Signals That Led Here
Kyle Willson flagged the retirement of GPT-5.2-Codex and GPT-5.1-Codex-Mini from Codex on April 14. That is a standard pre-launch pattern — OpenAI clears space before releasing the next thing. Tibo, who works at Codex, posted that the next few weeks will be intense and fun, which pulled 67,000 impressions and 2,000 likes. You can read that as confirmation of imminent activity without reading anything else into it. Combined with the pretraining completion date and the Polymarket odds, the April 16 leak looks credible.
For more background on Claude Mythos, including its cybersecurity profile that caused concern at release, see this post on the Mythos leak. The 77.80% SWE-bench Pro number that Spud is now chasing is also tied to the same model that successfully compromised a Linux system in a controlled test. Capability at this level comes with tradeoffs, and that pattern is worth keeping in mind as Spud approaches.
What Spud Is Expected to Do
The expected improvements are in reasoning quality, output coherence, and agent behavior. Stronger reasoning means better performance on complex multi-step tasks. The model is also being positioned as the backbone of OpenAI’s unified product suite, integrating ChatGPT, Codex, and agent workflows into one platform. The recent Codex plugin releases for Slack, Figma, and Google Drive fit that framing directly. If you missed those, here is the Codex plugins post.
On rollout, the expected order is ChatGPT Plus and Pro first, then the free tier through the Thinking feature two to four weeks later, then the enterprise API two to four weeks after that. No pricing has been confirmed.
The Broader Picture
Mythos was not a ceiling. The community framing right now is that Mythos was the opening move in a new phase. Spud is OpenAI’s near-term response. Google is expected to show something at I/O in May. An internal Google tool called Agent Smith has been circulating inside the company and is reportedly popular. The competitive pressure is real and the release cadence reflects it.
What is worth noting is that this is not some entirely new paradigm. Spud is a better model. It is expected to be significantly better than what OpenAI has now, and potentially competitive with the best thing currently available from any lab. That is meaningful. But the pattern here is the same pattern it has been for the last year: one lab posts a number, another lab responds with a number, and the leaderboard shifts. Mythos moved the leaderboard significantly. Spud is expected to move it back. That is the cycle.
April 16 is the date to watch. If the leak holds, we will have actual benchmark numbers shortly after and the GPT-5.5 versus GPT-6 question will answer itself.