The text 'Feb 2026' printed in black sans serif font on a pure white background

Every AI Model Released in February 2026

February 2026 is done. Here is every model worth knowing about from the past month, organized by lab.

Google DeepMind

Gemini 3.1 Pro dropped on February 19 in preview. It is a natively multimodal reasoning model built for tasks where a straight answer is not enough. The benchmark numbers are the headline. On ARC-AGI-2, it hit 77.1%, more than doubling Gemini 3 Pro’s 31.1%. GPQA Diamond came in at 94.3%, the highest score ever reported on that benchmark. SWE-Bench Verified landed at 80.6%. Pricing is $2.00 per million input tokens and $12.00 per million output tokens, with a 1,048,576 input token limit. It is available in the Gemini app, NotebookLM, Google AI Studio, Vertex AI, and Android Studio. General availability is listed as coming soon, and Gemini 3 Pro Preview is being discontinued on March 9.

Alongside Gemini 3.1 Pro, Google DeepMind also released Gemini 3 Deep Think and Nano Banana 2 this month. Gemini 3 Deep Think slots in as the extended reasoning variant, and Nano Banana 2 continues the lightweight efficiency line.

Gemini 3.1 Pro vs Gemini 3 Pro ARC-AGI-2 benchmark comparison

Anthropic

Anthropic released both Claude Opus 4.6 and Claude Sonnet 4.6 this month. Opus 4.6 is the one to reach for when you need deep research or large document reasoning. It has a million token context window and its knowledge work Elo sits 144 points above GPT-5.2. On ARC-AGI-2 it hit 68.8%, up from 37.6% for its predecessor. On BrowseComp for agentic search it came in at 84.0%, beating GPT-5.2 Pro at 77.9%. Pricing holds at $5 per million input tokens and $25 per million output tokens, same as Opus 4.5.

Sonnet 4.6 delivers near-Opus performance at Sonnet pricing, and in head-to-head code tests it was preferred 70% of the time. For more detail on how Opus 4.6 stacks up against OpenAI’s coding flagship, the Claude Opus 4.6 vs GPT-5.3-Codex breakdown covers that in full.

OpenAI

GPT-5.3 Codex and GPT-5.3 Codex Spark both landed this month. GPT-5.3 Codex runs 25% faster than GPT-5.2-Codex and uses 48% fewer tokens for the same results, which works out to a 2.6x effective throughput gain. It is also the first model to be classified as high capability for cybersecurity, hitting 77.6% on CTF tasks. Codex Spark is the speed-optimized variant, built for lower latency. If you want to know whether the speed numbers translate to real-world gains, the GPT-5.3-Codex-Spark post goes into that.

The naming situation is worth a brief acknowledgment. GPT-5.3 Codex, GPT-5.3 Codex Spark, Codex Max before that. Model companies have been notoriously bad at naming. GPT-1 through GPT-4 was clean. Then came GPT-4o, GPT-4.1, GPT-4.1 mini, o4-mini, and now we are somewhere in the GPT-5.3 Codex family with multiple variants. At this point it is mostly letters and numbers in a sequence that no one fully tracks. The models themselves could probably do a better job of naming things.

xAI

Grok 4.20 dropped this month with a parallel agents architecture. The model can run multiple agent threads simultaneously, which is the architectural differentiator xAI is pushing with this release. It is a meaningful structural move rather than a straight benchmark increment.

Zhipu AI

GLM-5 is the latest from Zhipu AI, continuing the GLM family’s push at the frontier. Chinese labs are releasing at a pace that keeps the open-source and semi-open leaderboard competitive. For context on where the Chinese labs sit overall in the rankings, the 2026 LLM rankings post has the full picture.

Alibaba

Qwen 3.5 from Alibaba continues to close the gap on proprietary frontier models. Open source will probably always run a couple of months behind closed source. Proprietary labs can take open-source models, apply internal techniques, and release something better. The real value of Qwen 3.5 and models like it is cost and privacy. It is not going to leapfrog Gemini 3.1 Pro or Claude Opus 4.6 for good, but it does not need to. It gives teams a capable model they can run without sending data to a third party or paying frontier pricing.

ByteDance and Kuaishou

On the video generation side, ByteDance released Seedance 2.0 and Kuaishou released Kling 3.0. Both are pushing quality and consistency forward in video generation. These are not text reasoning models, so they sit in a different evaluation category entirely, but they are a notable part of the overall February output from Chinese labs. Video generation is its own track right now, and both of these releases move that track forward.

MiniMax

MiniMax released M2.5 and M2.5 Lightning this month. M2.5 Lightning follows the same pattern as Codex Spark, a speed-optimized variant of the main model built for lower latency at some capability tradeoff. MiniMax has been consistent about releasing paired versions this way, giving developers a choice between peak capability and throughput.

The Release Cadence Is Not Slowing Down

February produced notable releases from at least eight labs across two continents. That is not unusual anymore. Both OpenAI and Anthropic are now using their current models to build the next versions. That shortens iteration cycles in a way that is hard to fully account for when you are trying to pick a model for a long-term project. It is not a runaway feedback loop, but it does mean the pace of releases is structurally faster than it was a year ago.

The practical takeaway if you are building on any of these models: do not bolt your stack to a single one. The best option this week will not be the best option next month. That was true in January, it was true in February, and it will be true in March. Using something like OpenRouter so a model swap is a one-line change is worth the setup cost. That applies whether you are on Gemini 3.1 Pro today or anything else on this list.

For the full head-to-head breakdown of the two biggest flagship releases this month, the Claude Opus 4.6 vs GPT-5.3-Codex post has the benchmark detail. And if you want to understand where each lab sits in the broader rankings picture, the 2026 LLM rankings post is worth reading alongside this one.