GLM-5 Turbo: Z.ai’s 200+ TPS Agent Model Is Now on OpenRouter

GLM-5 Turbo is a new model from Z.ai, available now on OpenRouter. It is built for agent workflows, specifically the kind that involve long execution chains, multi-step tool calls, and scenarios like OpenClaw. It runs at over 200 tokens per second and keeps the frontend and humor capabilities from the base GLM-5.

What GLM-5 Turbo Actually Is

GLM-5 dropped in February 2026 as Z.ai’s flagship, with 744 billion parameters and 40 billion activated. It posted strong numbers on coding and agent benchmarks, including 77.8% on SWE-bench Verified and 56.2% on Terminal Bench 2.0. GLM-5 Turbo is not a stripped-down version of that. According to Z.ai’s docs, it was tuned specifically from training for OpenClaw tasks, making it a specialized variant rather than a lighter one.

The focus areas are more reliable tool calling across multi-step sequences, better decomposition of complex layered instructions, improved reasoning through extended chains, and stronger temporal consistency in long-horizon tasks. In practice, this means fewer dropped steps and fewer failures when an agent needs to coordinate across multiple tools or sub-agents over a long session. Context window stays at 200,000 tokens, matching the base GLM-5, and max output is 128,000 tokens. It supports thinking mode, function calling, streaming, MCP2 integration, structured output, and context caching. Early testers have noted solid results on coding benchmarks like OpenCode and Kilo Code, good SVG generation and color handling, and the ability to run long agent chains without stability issues.

Speed at 200+ TPS

The headline number is over 200 tokens per second. That is meaningfully fast for agentic use. If you are running long chains where the model needs to call tools repeatedly, wait for results, and then continue reasoning, throughput matters a lot. Latency compounds across steps, and a model that runs at 200+ TPS keeps those chains moving without the gaps that slow down complex pipelines.

GLM-5 Turbo vs GLM-5 Base Tokens Per Second

One thing worth noting: GLM-5 Turbo is priced higher than the base GLM-5 on OpenRouter. That is counterintuitive for something with turbo in the name, but it reflects that this is not a budget tier. It is a specialized, high-throughput variant built for a specific use case, and Z.ai is pricing it accordingly. If you are coming in expecting a cheaper fast option, that is not what this is.

Who Gets Access and When

As of the March 15, 2026 release, GLM-5 Turbo is available through OpenRouter immediately. On Z.ai’s own platform, pro users on the coding plan get access in March 2026, and free users follow in April. There is an early access application available through Z.ai’s docs if you want in sooner.

If you are using OpenClaw, the setup is straightforward: add the model to your OpenClaw.json providers array and set agents.default.model.primary to "z.ai/glm-5-turbo". That is the configuration Z.ai’s docs specify for OpenClaw workloads, and it is about as low-friction as a model swap gets.

Where It Fits

The Chinese model space has been competitive on benchmarks but has historically had more inconsistency in real agent tasks, particularly around tool calling reliability and long-chain stability. GLM-5 Turbo is specifically targeted at closing that gap. Early tests describe it as impressive relative to prior Chinese turbo-tier models, particularly on coding tasks and agent stability. It is not going to displace the top-tier Western models for every task, but for OpenClaw-style workloads where throughput and tool call reliability matter, it is a serious option.

The fact that it keeps GLM-5’s frontend and humor capabilities intact means you are not trading off general utility for the agent-specific improvements. That matters if you are using the same model across different task types in a pipeline rather than routing to specialized models for each one.

If you are already using OpenRouter and want to keep model swaps low-friction, this is a one-line config change. That is the right way to approach any new model release right now, given how quickly the field is moving. For more on why flexibility in model selection matters, cost and pricing shifts across providers are moving fast enough that locking into any single model is rarely the right call.

For anyone interested in how Chinese open-source models compare on other dimensions beyond raw capability, including how they handle sensitive topics, there is relevant prior coverage in the ChinaBench censorship benchmark results across GLM, Qwen, Kimi, and others.

Links

They're clicky!

Follow me on X Visit Ironwood AI →

Adam Holter

Founder of Ironwood AI. Writing about AI stuff!