Open Source Can Never Win

Composer 2.5 is exactly why open source models can never win.

People often look at random benchmarks, compare the slope of closed-weights versus open-weights models, and then make the claim that open source is catching up or closing the gap. Sometimes they even claim that open source will overtake closed source within some timeframe.

People often point to Epoch’s chart as evidence of open models catching up.

But this argument has some severe flaws:

Benchmark Saturation

With any particular benchmark, you reach a point of saturation. Even if it’s not the kind of benchmark that’s capped at 100% (which would guarantee an eventual catch-up even without a change in the true intelligence gap), you eventually hit a ceiling.

Once you reach a certain point, each individual jump in capabilities is harder and harder to achieve. Making a little bit of progress now is much more difficult than making a bigger jump was a year ago. It’s often easier to increase from 30% to 40% than it is to increase from 90% to 92%.

As improvement slows down, you see a “gap narrowing,” but you would see that same narrowing if you compared any other advantaged and disadvantaged sets of models, like cheap versus expensive, fast versus slow, or high-reasoning versus low-reasoning. You’ll see the same pattern of catching up even though the underdog will never actually surpass.

The Frontier Advantage

Even if a model like Kimi K3 came along and an open-weight model did temporarily surpass closed-weights models, the frontier labs could simply apply all of their extra training, data, and special techniques to that open-weight base.

They would create a new model that is based off the open source model, but better.

Just like how Composer 2.5 is just Kimi K2.5 with their own data and training on top, any open source model that looks impressive can immediately be improved by a frontier lab and released as a fresh, top-of-the-line, closed-source frontier model.

This is an example of how Cursor drastically improved a base open-source model with their own training.

If Cursor, which isn’t even really a frontier lab, can make drastic improvements to an open base, just imagine what OpenAI could do applying all of their internal RL evals and post-training to a frontier open-source model. They could take whatever leading open-source model was out there and, in a few weeks, turn around with a better version.

Where Open Models Can Win

Price

Where open models really do dominate, though, is on cost. Because open-source models can be run on anybody’s hardware, that creates much more competition for price.

Artificial Analysis: Intelligence vs. Cost to Run Frontier

If you’re an top lab at the frontier of performance and you only have to compete with other frontier labs, you can afford to keep a pretty big margin. But if you’re also competing with 18 neo-clouds that want to be the default option on OpenRouter, then you are forced to lower price. That’s why we see open models dominating lower-cost regions of the cost-versus-intelligence frontier.

Good Enough

There is a wide range of tasks where you do not need the frontier of performance and an open model will do. If an open model is good enough for your task, it is usually the cheapest option. In tasks where open models are more than intelligent enough to do what you want, they are usually the right choice.

Privacy / Enterprise

Many enterprises can’t send their data to Frontier Labs data centers. Their organization just wouldn’t allow that kind of data leaving the premises.

To solve this, you need to run models that they can host on their own infrastructure; open models are the only choice for absolute privacy.

Custom Training

If you need to train a model on your own data or otherwise modify it for your own purposes, the frontier labs typically don’t let you do what you want with them. Even when they do offer a fine-tuning API, it often doesn’t provide enough control, and you need to use an open model that you can train yourself.

If you’ve got very specific workflows that don’t work already with frontier models, training Kimi K2.6 with your own framework is probably going to be better than using an out-of-the-box model.

Closing

In summary, despite what certain progress charts seem to indicate, open models can never win over frontier models for the absolute top-dog performance, because closed labs can always take the open-source base and improve it with their own stack.

However, open models can win on price-to-performance when they are good enough, or when you need enterprise security or custom training.

Links

They're clicky!

Follow on X →Ironwood →
Adam Holter
Adam Holter

Founder of Ironwood AI. Writing about AI models, agents, and what's actually happening in the space.