Grok 4 Fast is xAI’s newest multimodal model built for one thing: cost-efficient reasoning at scale. The headline is simple. A 2,000,000-token context window at a price that makes large inputs practical. If you are building agents or working in very large codebases, this hits a rare combination of speed, context, and cost.
\n
What Grok 4 Fast is right now
\n
- \n
- Two SKUs: reasoning and non-reasoning. Reasoning uses extra test-time compute for harder problems. Non-reasoning is faster and cheaper.
- Context window: 2,000,000 tokens. That is double Gemini’s 1,000,000 and far above most mainstream routes today.
- Multimodal: text and images. Tool-calling and agentic plan-act loops are supported.
- Availability: xAI API in general availability, plus a time-limited free window via OpenRouter and Vercel AI Gateway. On OpenRouter there is a
:freeroute that keeps the 2M context. - Pricing outside promos: about $0.20 per 1M input tokens and $0.50 per 1M output tokens.
- Reported speed: roughly 285–297 tokens per second and time-to-first-token as low as 0.5–2.6 seconds on supported routes. That keeps agent loops snappy.
\n
\n
\n
\n
\n
\n
\n
Why this matters
\n
Two million tokens lets you bring real projects into one call: monorepos, huge research corpora, policy archives, or long-running user threads. You do not need to over-engineer chunking or retrieval when the raw window can carry the whole working set. Pair that with token prices that do not punish iteration, and you get a model that fits day-to-day agent development, not just demos.
\n
How to enable reasoning mode on OpenRouter
\n
Use the x-ai/grok-4-fast route and switch on the reasoning trace when you need deeper planning:
\n
{\n \"model\": \"x-ai/grok-4-fast\",\n \"reasoning\": { \"enabled\": true },\n \"input\": \"your prompt\"\n}\n
Leave reasoning.enabled off for the non-reasoning variant when latency and cost matter more than extra deliberation.
\n
Pricing, with a concrete example
\n
- \n
- Inputs: ~$0.20 per 1M tokens
- Outputs: ~$0.50 per 1M tokens
\n
\n
\n
Example run: feed 1.2M tokens of a codebase and ask for a refactor plan returning 6,000 tokens.
\n
- \n
- Input cost: 1.2M × $0.20 = $0.24
- Output cost: 0.006M × $0.50 = $0.003
- Total: about $0.243
\n
\n
\n
\n
That price makes long-context work and multi-step agent loops viable without a surprise bill.
\n
Reference mix from our research. Grok 4 Fast sits low, which is the point.
\n
\n
Pricing versus popular routes
\n
Here is the per-million token picture many teams care about. Input and output pricing across common models:
\n
Grok 4 Fast undercuts common reasoning routes by a wide margin while keeping a 2M window.
\n
\n
Capabilities that actually matter
\n
- \n
- Multimodal input: text and images, with tool calls to reach external systems.
- Agent loops: plan, act, reflect, repeat. Works well for research or coding assistants chaining multiple tools.
- Search behavior: public chatter and the model card point to strong search quality.
- Training signals: end-to-end training with tool-use reinforcement learning. That shows up in how it handles iterative steps.
- Throughput: solid tokens per second and short time-to-first-token on supported routes.
\n
\n
\n
\n
\n
\n
Grok 4 Fast doubles the common 1M routes and far exceeds 200k-class models.
\n
\n
Benchmarks and public signals
\n
- \n
- LMArena Search leaderboard: launch placements show Grok 4 Fast at or near the top under grok-4-fast-search or menlo tags.
- Text leaderboards: community runs place it competitively. Numbers are moving targets.
- Artificial Analysis Intelligence Index: 60 on a blended zero-shot suite that includes reasoning, coding, math, instruction following, long-context, and agent tasks.
\n
\n
\n
\n
Use these as directional signals. Validate against your actual tasks.
\n
Strengths versus weaknesses
\n
- \n
- Strengths: excellent price per performance, 2M context, fast agent loops, good search behavior, strong tool-use signals, and practical throughput for interactive work.
- Weaknesses: front-end and UI code can be shaky, humor and creativity are middling, and writing polish trails the very top tier. For cheap, structured front-end work, GPT-5 Mini still makes sense.
\n
\n
\n
Safety and policy notes
\n
- \n
- Stricter NSFW refusals reported by early users.
- Model card flags risks around agentic abuse due to autonomous tool use. Expect mitigation layers and policy checks on default routes.
\n
\n
\n
Where to run it today
\n
- \n
- xAI API: direct access for production apps.
- OpenRouter:
x-ai/grok-4-fast:freeduring the promo window, then the paid route. Listing shows the full 2M context. - Vercel AI Gateway: listed during the promo period with the same context allowance.
\n
\n
\n
\n
If you care about pricing and routing knobs, I covered the trade‑offs here: OpenRouter’s 50% Off GPT‑5: real costs, RPM caps, and clean benchmarks.
\n
Agent use: where it fits
\n
Grok 4 Fast is a go‑to for agents that need to read a lot, think reasonably well, and respond quickly. The loop is smooth, retries are cheap, and the context ceiling is a relief for real projects. If your agent depends on planning, repeated tool calls, and mid‑run reflections, flip on the reasoning SKU. If you are doing deterministic transforms, extraction, or patches where the plan is already known, stay in non‑reasoning for lower latency.
\n
For agent incentives and trade‑offs, see Replit Agent 3 vs Open Source: Autonomy Is Real, But Incentives Decide.
\n
Mode selection playbook
\n
- \n
- Pick non‑reasoning for data extraction, formatting, simple research digests, straightforward code diffs, and any flow where milliseconds matter.
- Pick reasoning for planning, multi‑step tool use in one session, nontrivial debugging that benefits from on‑demand thinking, and deeper research answers.
- Default to non‑reasoning. Escalate to reasoning when the agent stalls or when the task clearly rewards extra deliberation.
\n
\n
\n
\n
Notes on context at 2M tokens
\n
The 2M window is real, but your mileage depends on the route, gateway timeouts, and prompt structure. A few tips that help in practice:
\n
- \n
- Chunk by semantic unit even when you do one call. Structure still beats a raw dump.
- Use stable section headers and anchors so the model can cite and jump around.
- Pin critical definitions and constraints near the bottom of your input so they sit close to the question. Proximity matters.
- Cache repeated boilerplate and policy text on the gateway when possible to cut your bill on subsequent calls.
\n
\n
\n
\n
\n
Performance signals worth tracking
\n
- \n
- Artificial Analysis Intelligence Index of 60, which puts it in the same band as Gemini 2.5 Pro on that suite.
- Cost‑to‑run estimate around $40 for their reference workload. This illustrates why price per task looks good.
- Search leaderboard placements on LMArena and community runs that shift as configs get tuned. Expect movement.
\n
\n
\n
\n
How I rank it in practical stacks
\n
- \n
- Displaces Grok 3 Mini for cost‑sensitive long‑context tasks.
- Reduces the day‑to‑day need for Grok 4 unless you want its deeper style and do not mind the cost.
- Challenges GPT‑4.1 and GPT‑5 Mini for agent runs where context size and iteration speed outweigh pristine prose.
- Keep GPT‑5 Mini around for front‑end implementation and design‑heavy tasks. It still wins there on output consistency.
\n
\n
\n
\n
\n
Archetype: Knight. It moves quickly, covers ground, and wins on cost per task.
\n
Concrete setup tips for agents
\n
- \n
- Use task routing: non‑reasoning for fetch and transform steps, reasoning for plan and resolve steps.
- Cap tool‑call episodes with a repetition check after N steps. This keeps loops from spinning.
- Return structured logs from tools to make the trace debuggable. That helps when you do switch to reasoning.
- Throttle vision inputs unless you need every frame. Image tokens add up.
\n
\n
\n
\n
\n
Quick start checklist
\n
- \n
- Route:
x-ai/grok-4-fast. During promo, tryx-ai/grok-4-fast:freeon OpenRouter. - Mode: set
reasoning.enabledtrue only when needed. - Context: if you are over 200k, this is one of the few models that can handle it cleanly right now.
- Tools: wire up tool calls and allow a handful of internal steps per request if your gateway supports it.
\n
\n
\n
\n
\n
When to pick Grok 4 Fast
\n
- \n
- You need to ingest massive context and still iterate quickly.
- You are building agents where low latency and stable tool use matter.
- You care about dollars per task more than stylistic writing or jokes.
\n
\n
\n
\n
When to pick something else
\n
- \n
- Front‑end or design‑heavy coding and brand voice writing. Keep GPT‑5 Mini and writing‑first routes in your toolbox.
- Tasks that depend on top‑tier creative writing or specific tone. Grok 4 Fast is fine, but not the leader there.
\n
\n
\n
Bottom line
\n
Grok 4 Fast is a practical default for cost‑sensitive long‑context work and for agents that need to move quickly. The 2M window and low token pricing make it ideal for large codebases and heavy research inputs. It is not the first pick for front‑end UI coding or polished creative writing, but for reasoning at scale it is a strong option. Use the promo routes while they are live, then keep it in rotation for production once your flows are dialed in.