GPT-5.4 Mini and Nano: Benchmarks, Pricing, and What They’re Actually Good For

OpenAI released GPT-5.4 mini and GPT-5.4 nano on March 17, 2026. These are smaller, faster variants of GPT-5.4 built for high-volume workloads where latency matters more than raw capability. If you were using GPT-5 mini, these are worth paying attention to.

What GPT-5.4 Mini Brings

GPT-5.4 mini runs more than 2x faster than GPT-5 mini and closes a meaningful gap with the full GPT-5.4 model on several benchmarks. On SWE-Bench Pro, it hits 54.4% versus 45.7% for GPT-5 mini and 57.7% for the full GPT-5.4. On OSWorld-Verified, which tests computer use via screenshot interpretation, it reaches 72.1% versus 42.0% for GPT-5 mini. That is a large jump for a smaller model.

The model supports text and image inputs, tool use, function calling, web search, file search, computer use, and skills. It has a 400k context window, keeping it in line with the rest of the GPT-5 family. The tool-calling numbers are also worth noting: Toolathlon goes from 26.9% on GPT-5 mini to 42.9% on mini, and GPQA Diamond climbs from 81.6% to 88.0%. These are not marginal improvements.

Benchmark comparison chart for GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, and GPT-5 mini

What GPT-5.4 Nano Is For

GPT-5.4 nano is the cheapest option in the GPT-5.4 family. OpenAI recommends it for classification, data extraction, ranking, and simpler coding subagents. It is not trying to compete with mini on complex tasks, and the benchmarks reflect that. On Terminal-Bench 2.0, nano lands at 46.3% while mini reaches 60.0%. On OSWorld-Verified, nano actually scores below GPT-5 mini at 39.0% versus 42.0%, which is the one area where it does not clearly beat its predecessor.

Where nano wins is cost. At $0.20 per million input tokens and $1.25 per million output tokens, it is the cheapest way to run GPT-5.4 family models. For tasks that do not need reasoning depth, that tradeoff is worth taking. They also cost creeped 2.25x-3.5x from GPT 5 moini and nano. For a broader look at how model costs have been trending across providers, see Cost Creep 2026.

Pricing

ModelInput per 1M tokensOutput per 1M tokensAvailable In
GPT-5.4 mini$0.75$4.50API, Codex, ChatGPT
GPT-5.4 nano$0.20$1.25API only

Subagent Architecture

The most useful framing OpenAI offers for these models is the subagent pattern. A larger model like GPT-5.4 handles planning and coordination while GPT-5.4 mini runs narrower subtasks in parallel: searching a codebase, reviewing a large file, processing supporting documents. Mini costs 30% of the GPT-5.4 quota in Codex, which makes that delegation financially sensible for teams running high-volume coding workflows.

This is the direction coding workflows are heading. You compose systems where the heavy thinking happens once and the fast execution happens at scale. The efficiency gains matter more than the raw benchmark numbers in that context. For reference on how tools like Cursor approach agent evaluation in coding workflows, see CursorBench-3.

Aabhas Sharma, CTO at Hebbia, noted that mini matched or exceeded competitive models on several output tasks and achieved higher end-to-end pass rates than the larger GPT-5.4 model on their specific workloads. That is a useful data point for teams evaluating cost-performance tradeoffs, though results will vary by workflow. The broader pattern holds: at this level of capability, the right model for a given task is often not the largest one available.

Computer Use Performance

GPT-5.4 mini’s OSWorld-Verified score of 72.1% sits close to the full GPT-5.4 at 75.0%, and both run well ahead of GPT-5 mini at 42.0%. For applications that need to interpret screenshots, navigate interfaces, or complete computer use tasks at speed, mini is doing most of what the full model does at a fraction of the cost and latency. That is a practical win for any team building computer-using agents. Nano drops significantly here at 39.0%, so if computer use is part of the workflow, mini is the right tier.

Long Context and Tool Calling

The long context benchmarks tell a more nuanced story. On OpenAI MRCR v2 with 8 needles at 64K to 128K context, GPT-5.4 mini lands at 47.7% versus the full model’s 86.0%. That is a real gap. Mini is not a replacement for GPT-5.4 when the task requires tracking many details across a very long document. For tasks that stay within reasonable context lengths, it performs well. For deep long-context retrieval, the full model still has a clear edge.

Tool calling is stronger across the board. On the telecom-focused tau2-bench, mini hits 93.4% versus 74.1% for GPT-5 mini. On MCP Atlas, mini reaches 57.7% versus 47.6%. These are the kinds of gains that matter for agentic systems where reliable tool use is the bottleneck, not reasoning depth.

What It Does Not Fix

Frontend generation is still not a strong suit for either model. The coding and tool use numbers are real improvements, but that specific weakness carries over from earlier models. If frontend generation is a primary use case, the benchmark improvements elsewhere do not change that picture much. Manage expectations accordingly.

GPT-5.4 mini is available now in the API, Codex, and ChatGPT. Free and Go users on ChatGPT can access it through the Thinking feature. For other tiers, it serves as a rate limit fallback for GPT-5.4 Thinking. GPT-5.4 nano is API-only for now.

Links

They're clicky!

Follow me on X Visit Ironwood AI →

Adam Holter

Founder of Ironwood AI. Writing about AI stuff!