Prime Intellect just released INTELLECT-3, a 106B-parameter Mixture-of-Experts (MoE) model that utilizes only 12B active parameters at inference time. This model is trained end-to-end with large-scale reinforcement learning (RL) and is claiming state-of-the-art performance for its size across math, code, science, and general reasoning tasks. It’s positioned to compete with much larger frontier models, and is available now through OpenRouter at a competitive rate of $0.20 per million input tokens and $1.10 per million output tokens, supporting a 131K token context window.
What INTELLECT-3 Actually Is
INTELLECT-3 starts with the GLM-4.5-Air base model. It was then subjected to supervised fine-tuning (SFT) followed by large-scale RL. The MoE architecture is the key to its efficiency; while it has 106B total parameters, only 12B are activated during inference. This is a crucial design choice that keeps inference costs low while maintaining high accuracy, especially on structured tasks and complex multi-step problem solving. This is not a new concept, but it is executed well here.
The core proposition is straightforward: deliver performance comparable to proprietary giants without the prohibitive cost or parameter count. The benchmarks suggest it is a strong contender in the open-weight space, particularly in domains requiring deep reasoning, such as advanced mathematics and software engineering tasks.
The Real Story: Infrastructure Democratization
The model itself is a solid technical achievement, but the real significance of this release lies in the engineering work that enabled its creation, all of which Prime Intellect has open-sourced. This is infrastructure democratization.
To train INTELLECT-3, Prime Intellect built three key components that solve common bottlenecks in large-scale RL training:
1. prime-rl: The Async RL Trainer
prime-rl is Prime Intellect’s production-scale post-training framework, designed exclusively for async training. Why async? Synchronous RL training introduces severe bottlenecks when dealing with long-horizon rollouts—the model has to wait for the environment interaction to complete before taking the next step. By going async, the system is always a few steps off-policy, which is the only practical way to scale RL efficiently across hundreds of GPUs and keep those expensive resources busy. This choice is a testament to sophisticated engineering, prioritizing throughput and resource utilization over the theoretical purity of on-policy sampling.
2. Prime Sandboxes: High-Throughput Code Execution
Training models on code and reasoning tasks requires executing untrusted code at massive scale. The throughput required for this kind of large-scale RL necessitates a specialized solution. Prime Intellect found that standard Kubernetes patterns were insufficient due to control plane overhead and slow startup times.
Their solution, Prime Sandboxes, is a custom execution layer that bypasses the control plane entirely. It uses a direct Rust-to-pod execution path, achieving sub-10-second startup times at massive concurrency while maintaining near-local-process latency. Furthermore, they cleverly hide the remaining latency by overlapping sandbox provisioning with the model’s reasoning phase, ensuring the environment is ready the instant the model needs to execute code. This level of optimization is what separates theoretical research from production-grade training.
3. The Training Scale
The entire training process—SFT and RL stages—ran on a significant compute cluster: 512 NVIDIA H200 GPUs distributed across 64 interconnected nodes over two months. The model was trained on diverse RL environments covering math, code, science, logic, deep research, and software engineering. This demonstrates that while the model is open-weight, the compute required to build it is substantial, but accessible if you have the right methodology and infrastructure.
INTELLECT-3 achieves high performance with a fraction of the active parameters of a typical large proprietary model.
The Open Release Strategy: Why It Matters
Prime Intellect isn\’t just releasing weights; they are releasing the entire recipe. This includes the model weights, the prime-rl training framework, the custom execution layer design, the datasets, the RL environments, and the evaluations. They are also releasing SYNTHETIC-2, a dataset containing four million verified reasoning traces generated during the RL training pipeline.
This commitment to transparency is what makes open source valuable. As I\’ve said before, open source will always chase closed source, perhaps being a couple of months behind, but it serves a crucial function: driving down costs and ensuring that the proprietary labs don\’t get too comfortable. This move pushes the frontier of accessible RL training. It provides the tools and methodology needed for smaller groups to train competitive models, reinforcing the idea that you don\’t need to be entrenched in a major lab to produce top-tier AI. The release of the full training stack, particularly the RL framework, is a greater strategic advantage for the community than just the model weights alone. This mirrors the importance of open methodology seen in other projects, such as OLMo 3 32B Think.
The Bottom Line
INTELLECT-3 is a highly capable open-weight MoE model with strong benchmark performance, particularly on reasoning tasks. It offers a generous 131K context window and competitive pricing on OpenRouter. If you need a model focused on multi-step problem solving and structured tasks, it\’s immediately worth testing.
However, the lasting impact of this release will be the open-sourced infrastructure. The prime-rl framework and Prime Sandboxes architecture are blueprints for how to do large-scale RL training efficiently today. Prime Intellect is making a strong case that with the right engineering, competitive models can be built outside the traditional proprietary ecosystem. This is a win for open AI and for anyone interested in the technical realities of training frontier models.