GPT-5.4 Fast Mode Is the Best Part of the Release

GPT-5.4 is good, but the part that matters most to me is not a benchmark screenshot. It is /fast in Codex. OpenAI says it gives up to 1.5x faster token velocity on the same model, with the same intelligence, just faster. That is a real improvement for anyone running longer coding sessions, agent loops, browser workflows, or repo-wide tasks where waiting around is the main annoyance.

The tradeoff is easy to understand. /fast uses your limits faster. I still think that is a good deal for a lot of users. If the model is already good enough to do the work, then getting the same work done sooner is useful. I would usually rather finish the task and spend the quota than save quota while an agent crawls through the job.

That speed option also fits the broader GPT-5.4 release. OpenAI is trying to make one default serious-work model instead of making people choose between the reasoning GPT and the coding Codex identity. GPT-5.4 folds GPT-5.3-Codex style coding into the mainline family, adds native computer use, supports up to about 1.05M tokens in the API and Codex, and pushes hard on office outputs like spreadsheets, slide decks, documents, finance workflows, and web research. The product message is clear enough. This is the model OpenAI wants people using for work, not just chat.

The benchmark story supports that, but it also shows where the gains are strongest. GPT-5.4 scored 83.0% on GDPval wins or ties, 57.7% on SWE-Bench Pro, 75.0% on OSWorld-Verified, 54.6% on Toolathlon, and 82.7% on BrowseComp. The big jump is not some massive separation from GPT-5.3-Codex on pure coding. The bigger story is computer use, tool use, and broader workflow quality compared with GPT-5.2. That is why the launch conversation moved so quickly toward agents, browser control, Excel, long-running tasks, and Codex.

That lines up with how I see it. GPT-5.4 seems built to keep going for a long time. That matters more than squeezing out a tiny gain on a narrow coding benchmark. GPT-5.3-Codex could already solve most coding tasks people were throwing at it, so it is hard to find clean examples where the old model failed badly and GPT-5.4 suddenly wins with ease. I do not think the honest case for GPT-5.4 is that it makes impossible coding tasks trivial. The better case is that it improves the whole workflow package: coding, tools, computer control, long context, and work product in one model.

OpenAI also leaned hard into spreadsheets and document work, and that part of the release should not be ignored. On its internal spreadsheet modeling benchmark, GPT-5.4 scored 87.3% versus 68.4% for GPT-5.2. Human raters preferred GPT-5.4 presentations 68% of the time over GPT-5.2. ChatGPT for Excel also launched the same day, powered by GPT-5.4, which makes the spreadsheet-heavy launch messaging make a lot more sense. If you are looking at this release only through the lens of coding, you are missing part of the point.

Steerability looks very good too. OpenAI says GPT-5.4 Thinking can show an upfront plan and lets the user steer the response mid-stream. Early reactions also describe it as highly steerable, though it may want somewhat different prompting than older GPT models. If you are going to use it heavily, the prompting guide is worth reading. Once a model is doing longer, multi-step work, steerability matters a lot because you do not want to restart the entire task every time you need to nudge it.

One thing I was watching closely was writing quality. Early reactions were mixed there. Some people said GPT-5.4 was worse on creative writing or conversational feel, and some criticism went further and claimed it guessed too much instead of clarifying. I do not think all of that should be dismissed, because taste in writing is messy and some people clearly preferred older behavior. But after more testing, my own view is better than the first-day backlash suggested. It is very good at matching style and tone. I would call it a strong writer. OpenAI also says it reduces some common ChatGPT wording habits, which is a welcome change because those habits were stale and too recognizable.

The main weak spot for me is still frontend work. GPT-5.4 keeps a lot of the familiar GPT 5.X frontend style, and I still do not like that look. OpenAI specifically called out stronger frontend work in this release, and there are cases where it does better, but the default visual style still lags. If I want a polished final frontend pass, I would often still switch to Claude Opus 4.6 or Gemini 3.1 Pro.

Skills help a lot here. I tried frontend generation with a skill that pushes the output away from the default GPT 5.X look, and the result was much better. Still not where I would rank Claude or Gemini for polish, but clearly improved. The difference between with-skills and without-skills is large enough that I would treat skills as mandatory if frontend quality matters. OpenAI shipping things like Playwright Interactive in Codex makes sense in that context. The base model can do the job, but it benefits a lot from the right scaffolding.

Some of the fun early demos make the same point. People are having GPT-5.4 rewrite Pokemon Red with AI models swapped in, and there are examples of it handling other 3D game projects too. I would not treat those as formal proof of anything, but they do show the kind of long-running, multi-step behavior this release is built for. The deeper takeaway is not that everyone needed Pokemon Red rebuilt. It is that the model can keep its bearings while doing a lot of work.

The computer use numbers also matter here. GPT-5.4 is OpenAI’s first general-purpose model with native computer use, and it can operate via screenshots, keyboard and mouse actions, or code through tools like Playwright. On OSWorld-Verified it hit 75.0%, which is above the human baseline OpenAI cited at 72.4% and far above GPT-5.2 at 47.3%. That is one of the clearest signs that this release is aimed at longer-horizon agent workflows, not just better chat replies.

The 1M token headline is real for the API and Codex, but there is a pricing catch that people should keep in mind. GPT-5.4 costs $2.50 per million input tokens, $0.25 for cached input, and $15 per million output tokens. Prompts over 272K input tokens are priced at 2x input and 1.5x output for the full session. So yes, the long context is there, but huge sessions are not cheap. That makes efficiency matter more, which is another reason /fast and the broader token-efficiency improvements matter.

My view on GPT-5.4 is pretty plain. It is a good release. /fast is one of the best parts of it because a 50% speed bump on the same model is immediately useful for the kind of work this model is meant to do. GPT-5.4 is highly steerable, strong for long-running agent tasks, and better at style and tone than some of the early backlash suggested. The default frontend output is still weak, and I would still use skills or hand the final polish to Claude Opus 4.6 or Gemini 3.1 Pro if the UI matters a lot.

If you want more context on the speed side of this model race, I previously wrote about GPT-5.3-Codex-Spark speed claims. If you want the broader comparison that led into this release, I also covered Claude Opus 4.6 versus GPT-5.3-Codex. GPT-5.4 fits neatly into that pattern. It is a new better work model, and the speed option is one of the clearest reasons to care.

Links

They're clicky!

Follow me on X Visit Ironwood AI →

Adam Holter

Founder of Ironwood AI. Writing about AI stuff!