Qwen passed Llama. Asia passed North America on cumulative downloads. That is the open model story of 2025, and Nathan Lambert’s presentation Open Models in 2025 — The Curve lays out how the shift happened and what to do next.
The core point is straightforward: Chinese open models are no longer a fringe option. For many teams they are the practical default base for fine-tuning, multilingual work, and multimodal tasks. If you choose base weights for production or research, this matters now.
Key takeaways
- Qwen is the most used open LLM family by adoption and cumulative downloads on ATOM Project tracking through October 2025.
- China’s share of new fine-tuned models exceeded 50 percent by mid-2025, and Asia’s cumulative downloads crossed North America in August 2025 on the slides.
- Two viable product strategies have emerged: platform families that cover many sizes and modalities, and focused single-line labs that aim for narrow, high-performance wins.
- The West needs sustained funding for open, researcher-viable models that are competitive on the key tasks people actually ship.
What changed between 2023 and 2025
Meta’s early LLaMA drops in 2023 and rapid iteration through 2024 created an early advantage for open Western models. By 2025 the field became crowded and the traction story flipped. Qwen shipped broad SKUs across dense and MoE architectures, long context, vision-language, code variants, and inference optimizations. That breadth plus permissive licensing and steady releases made Qwen easy to adopt.
At the same time, Meta moved to a more cautious release posture during 2025. Frequent, reliable open releases matter to researchers and integrators. When a platform stops shipping or becomes less predictable, downstream work drifts to families that maintain active release lines and clear licensing.
2025 release cadence that mattered
Lambert’s timeline is useful because it shows both cadence and where momentum landed. Highlights that shaped adoption:
- January: DeepSeek R1 shipped and generated rapid interest for reasoning and cost-effective inference.
- April: Llama 4 and Qwen 3 released in the same window. Qwen’s follow-up cadence ended up outpacing Llama in downstream picks.
- June to September: A steady stream of Chinese releases across MiniMax, Baidu ERNIE, Kimi, GLM, and Qwen verticals for code, vision and agentic work.
Two product strategies that work
There are two repeatable ways teams are winning with open models today.
- Platform families: wide SKU breadth covering multiple model sizes and modalities. These families become defaults because they reduce switching friction. Examples include Qwen, Llama, and Gemma.
- Focused single-line labs: a narrow set of models tuned for specific tasks where performance per-dollar matters. These are easier to run locally and often win specialized slots. Examples include DeepSeek and Kimi.
Platform families accumulate gravity through breadth and ecosystem compatibility. Focused labs win by being the best drop-in for a particular workload. Picking a base is now about this trade-off: breadth and long-term stability versus task-specific price-performance.
Why Qwen pulled ahead
Qwen’s lead is explainable in straightforward terms:
- SKU breadth across dense and MoE variants, text and VL, coder and agentic tracks.
- Permissive enough licensing for many downstream commercial uses.
- Frequent releases and clear documentation that reduce friction for integrators.
- Tooling and community adoption that accelerate downstream example code and adapters.
Those are the boring, operational reasons why teams default to Qwen. Platform-style coverage matters when your product needs to combine long context, code, and vision features without juggling multiple incompatible bases.
Adoption and performance signals
ATOM Project charts in the deck show Qwen’s cumulative downloads accelerating through 2024 and 2025 and crossing Llama. Monthly fine-tuning share charts show Qwen base models becoming the majority of new fine-tuned releases after mid-2025. The performance panel in the slides uses ArtificialAnalysis data labeled August 2024 and shows Chinese models improving across multilingual reasoning and agentic tasks. Treat the performance panel as directional; adoption panels are presented through October 2025.
A caution on content and moderation: the deck includes an example DeepSeek completion with pro-Party framing. That is a tone example, not a benchmark metric. For consumer-facing work and cross-region products, teams should validate content style and safety with targeted evaluations and prompt controls.
Implications for research and product teams
If you build models or integrate them, here is what changes practically.
- Local deployment is more viable for more tasks because weights and optimizations are better and consumer GPUs are more capable.
- Picking Qwen often reduces friction for multilingual and multimodal pipelines. Its code and VL variants have clear downstream traction.
- Focused models still beat platform families on specialized benchmarks and specific price-performance trade-offs. Run task-specific holdout tests before standardizing.
- For software engineering and agent work, use targeted comparisons. My writeup on SWE-bench verified model comparisons is a helpful task-specific view for code workloads. Use that to decide where closed APIs still win.
Operational checklist when selecting a base model:
- Task fit: confirm the model variant matches your primary workload whether text, vision-language, speech, or code.
- License and redistribution: confirm commercial terms before you fine-tune or redistribute artifacts.
- Context needs: test your real prompts with long context variants and retrieval stacks before committing.
- Multilingual validation: run internal evals for your target languages rather than relying solely on public leaderboards.
- Inference plan: decide whether to run locally, on low-latency accelerators, or through managed APIs. Cost and latency trade‑offs depend on batch size and model family.
- Community and maintenance: pick bases with active repos and recent commits. You will need adapters, fixes, and examples.
Policy and funding: what the West should do
Lambert’s policy point is blunt. Small grants and pilot programs are useful but insufficient. If the aim is to keep open, hands‑on research available for students, startups, and independent labs, funding must support real training runs, inference stacks, and stable release lines that people can rely on for years.
I do not expect open to permanently pass proprietary models on every frontier metric. It will trail, sometimes catch up, then fall behind again. That pattern does not invalidate open’s role. Open models lower cost, protect privacy for local deployments, and give researchers a shared starting point. The question now is one of scale: are Western funding programs large enough to sustain the cadence that builders trust?
Where this likely goes next
Training know‑how is spreading and non‑frontier training runs are cheaper than a year ago. That will keep feeding releases from many labs across regions. If Chinese groups continue to publish useful, well‑documented weights, adoption will keep tilting their way. The West has the talent and infrastructure to match but must commit to more consistent, long‑term release programs and to funding the operational costs of inference and data work.
For teams choosing a base at the end of 2025, the pragmatic move is this: start with Qwen, run real task‑level evaluations, and only switch if a focused model wins convincingly on your workload or if licensing prevents the choice. Keep an eye on releases and community activity because that is the signal of where people are building.
Notes on sources and data vintage
- Adoption and download charts in Lambert’s deck run through October 2025 and cite The ATOM Project for the underlying CSVs.
- Performance panels are sourced to ArtificialAnalysis with a data vintage of August 2024 and should be taken as directional rather than current month snapshots.
If you want task-specific guides: see my SWE-bench model comparison for code work and my GPT-5-Codex API guide for practical scaffolding when you combine closed APIs with open weights. Related reads include GLM 4.6 vs Claude Sonnet 4.5 for cost and capability context and model-specific reviews for audio and on‑device voice if you run local inference.
Credit to Nathan Lambert for The Curve dataset pulls and framing. If you need raw CSVs, the slides cite The ATOM Project and ArtificialAnalysis as data owners. Contact info is listed on the deck for follow‑up.
If you want a short, practical checklist to share with an engineering lead, paste this into your project doc: choose Qwen as default, validate license, run a 2‑week task holdout comparing Qwen to one focused model, and decide inference path before production. That will save time and reduce surprises.
For now the center of gravity in open models sits with Qwen. That does not mean Qwen is always the right pick, but it does mean you should start there and only move off it for documented, task‑level reasons.
Related internal posts you may find useful: