Meta Delays Avocado After Weak Internal Evals While xAI Keeps Losing Ground

Meta delayed Avocado to at least May after internal evaluations showed it lagging the top models from Google, OpenAI, and Anthropic on reasoning, coding, and writing. That is the main point, and it matters more than the spin around it. Avocado reportedly beats Meta’s prior model and edges past Gemini 2.5 from March, but still falls short of Gemini 3.0 from November. If that is where the model lands, Meta is behind by a full cycle in a market where being late by months is a serious problem.

This is why I do not buy the idea that Meta is still right there at the frontier just because it has the money and has hired aggressively. Frontier status is not something you get by press release. It comes from shipping models that are either best in class or close enough that there is a strong case on price, speed, reliability, or product fit. From the reporting, Avocado does not sound like that model.

The most damaging part of this story is not even the delay. Delays happen. The ugly part is that Meta has reportedly discussed licensing Gemini from Google as a temporary bridge for Meta AI products. If you are building your entire public identity around leading AI, then considering a competitor’s model as a stopgap is a brutal signal. It suggests the internal model is not ready, not strong enough, or both.

The chart above is only a visual summary of the reported placement, not a benchmark sheet. Meta has not published exact numbers. Still, the qualitative positioning is enough to draw a useful conclusion. Avocado may be an improvement for Meta internally, but improvement relative to your own last model is not the standard if your stated goal is to compete with the labs setting the pace.

This also lands in a bad context for Meta because Llama 4 already hurt trust. Once a company gets a reputation for shaky benchmark presentation or overclaiming progress, every later delay looks worse. Internal memos about efficiency gains do not solve that. Neither do vague statements about trajectory. The market cares about what the released model can do, how much it costs, and whether it holds up under public use.

People also keep treating AI hiring like a sports free agency period. Meta poaches a bunch of talent, so the assumption is that the scoreboard will flip. That is not how this works. Good researchers matter, but a model lab is a whole machine. Data quality matters. Infrastructure matters. Post-training matters. Evals matter. Product discipline matters. Iteration speed matters. You do not buy your way into being a frontier lab by collecting resumes.

Meta’s spending makes this worse, not better. The company is putting massive capital into AI, custom chips, and staffing. That would be easier to defend if the output looked dominant. When the result is a delayed model that still trails the leaders, the spending starts to look more like a reminder that money alone does not close the gap.

xAI looks weak too, although the evidence there is less cleanly documented than the Avocado story. The broad read is still not good. There have been more departures, the newer model progress looks thin, and the pricing does not help. Grok 4.1 Fast remains the useful part of the line because the cost makes sense. Beyond that, the value proposition gets much harder to defend. If Grok 4.20 beta costs around ten times more for barely any gain, then that is not a serious upgrade. That is cost bloat.

That cost point matters because too many people talk about the AI race like it is a pure benchmark contest. It is not. A model can be decent and still be a bad choice if the price increase wipes out the value of the improvement. I have written before about how small quality gains can get erased by pricing drift in Cost Creep 2026, and the same logic applies here. Cost per task completed matters more than marketing.

The wider lesson is that the AI race is not one giant pack moving together. Some labs are setting the pace. Some are offering strong low-cost defaults. Some are spending huge amounts and still failing to turn that into top-tier output. If you want a broader view of how crowded this release cycle has been, Every AI Model Released in February 2026 is useful context because it shows how little room there is for a late, merely decent model.

I am not saying Meta or xAI can never recover. Labs can improve. Releases can surprise people. But I am also not going to pretend they are leading because they want to be seen that way. Right now, the frontier is elsewhere. Meta’s own reported internal placement says that. The possibility of licensing Gemini says that even louder. On xAI, the combination of limited progress and bad pricing says the same thing from a different angle.

If you are choosing models today based on capability and economics, I would not prioritize either company. Meta has not shown that it can catch up on the metrics that matter most, and xAI has not shown that it can justify the cost of moving up its own product stack. A delayed model that still trails the leaders is still a trailing model. A costly model that barely improves on the cheaper version is still a bad deal.

That is my read on both stories. Meta is spending like a frontier lab without shipping like one. xAI still has one cost-performant default, but not much reason to move beyond it. At this point, if you care about capability, price, and iteration speed, the leaders are somewhere else.

Links

They're clicky!

Follow me on X Visit Ironwood AI →

Adam Holter

Founder of Ironwood AI. Writing about AI stuff!