OpenAI Debuts GPT-5.4 Mini and Nano: Smaller Models Built for Speed, Scale, and Subagent Work

OpenAI launched GPT-5.4 mini and nano on March 17, 2026 — compact models engineered for speed and volume that approach the flagship on coding and agentic benchmarks at a fraction of the cost.

OpenAI Debuts GPT-5.4 Mini and Nano: Smaller Models Built for Speed, Scale, and Subagent Work

OpenAI just pushed the frontier of how small AI models can act fast, cheap, and still do real work.

On March 17, 2026, OpenAI unveiled GPT-5.4 mini and GPT-5.4 nano — two compact variants of its latest flagship, built for scenarios where speed and volume matter more than sheer size. These are not stripped-down afterthoughts. They are engineered to handle real tasks like coding assistance, multimodal reasoning, and agentic workloads at latencies and costs far below the full GPT-5.4.

What the Benchmarks Show

The headline number: GPT-5.4 mini runs more than twice as fast as the previous GPT-5 mini while approaching the flagship GPT-5.4 on key benchmarks. According to The Decoder's analysis, the performance gap is remarkably narrow in some categories.

On SWE-Bench Pro (a rigorous coding benchmark), GPT-5.4 mini scores 54.4% compared to 57.7% for the full GPT-5.4 — a gap of just 3.3 points. The previous GPT-5 mini managed only 45.7%. That is a generational leap in compact-model coding ability.

On OSWorld-Verified (agentic and vision tasks), GPT-5.4 mini hits 72.1% versus the flagship's 75.0%. On tool-calling tasks via MCP Atlas, mini scores 57.7% against the flagship's 67.2%.

GPT-5.4 nano posts strong numbers too: 52.4% on SWE-Bench Pro, 56.1% on MCP Atlas tool-calling, and 82.8% on GPQA Diamond (scientific reasoning). Where nano stumbles is vision-heavy work — 39.0% on OSWorld-Verified compared to mini's 72.1% — making its sweet spot clear: text-first tasks at high speed.

Pricing: Cheaper Per Token, More Expensive Per Generation

The pricing structure reflects a deliberate trade-off. According to DataCamp's breakdown:

  • GPT-5.4 mini: $0.75 per million input tokens, $4.50 per million output tokens
  • GPT-5.4 nano: $0.20 per million input tokens, $1.25 per million output tokens
  • GPT-5 mini (predecessor): $0.25 / $2.00
  • GPT-5 nano (predecessor): $0.05 / $0.40

That means GPT-5.4 mini costs 3x more per input token than its predecessor, and nano is 4x pricier than the old nano. OpenAI is clearly pricing capability, not just compute. You pay more per token, but each token does more work — and you need fewer of them for the same task.

Both models support a 400,000-token context window, matching the flagship.

The Subagent Architecture Angle

The real story here is not individual model performance. It is what these models enable in multi-agent systems.

The Neuron framed it well: OpenAI built "a team of AI interns for your AI boss." In practice, this means developers can build systems where a powerful model — GPT-5.4 or comparable flagships — handles planning, judgment, and complex reasoning, while mini and nano handle the bulk parallel work: code generation, data extraction, classification, and routing.

GPT-5.4 mini consumes only 30% of GPT-5.4's quota in GitHub Copilot, which means developers can run three mini calls for every flagship call within the same usage limits. For coding workflows where speed and iteration count more than peak reasoning, that is a meaningful efficiency gain.

Availability

GPT-5.4 mini is available across the OpenAI API, Codex, and ChatGPT. Free and Go-tier ChatGPT users can access it through the Thinking feature. For paid users, it serves as a rate-limit fallback when GPT-5.4 Thinking hits capacity.

GPT-5.4 nano is API-only — a clear signal that OpenAI sees it as a developer tool, not a consumer-facing product. Its use cases are infrastructure: classification pipelines, extraction jobs, simple sub-agent roles in larger systems.

For anyone building AI-powered search, retrieval, or content systems, these models change the cost calculus significantly. Tasks that were previously expensive to run at scale — real-time content classification, query routing, entity extraction, sentiment scoring — become viable at much higher throughput.

The broader pattern is clear across the industry: flagship models handle the hard reasoning, and compact models handle the volume. Anthropic's Haiku, Google's Gemma variants, and now OpenAI's 5.4 mini/nano all reflect this split. The winners in AI-powered applications will be the teams that learn to orchestrate across model tiers, not the ones that default to the biggest model for everything.


James Calder is the editor of The Search Signal, covering AI-powered search, generative engine optimization, and the future of brand discovery.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Search Signal.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.