Meta just dropped LLaMA 3.2, and it’s got some interesting new models. Let’s break them down.
First up, we’ve got two vision-capable models: an 11B and a 90B. The 11B is basically their 8B model with vision skills tacked on, while the 90B is an upgraded version of their 70B from LLaMA 3.1. They’re decent at visual tasks, but don’t expect GPT-4 Omni level performance. One notable weakness is generating web page code from images – it’s not great at that. But for a smaller, open-source option, they’re not bad.
If you’re after top-tier vision capabilities, you might want to look at closed-source alternatives or check out MOLMO 72B, which is killing it right now. I might do a deep dive on MOLMO or Mistral’s Pixtral model soon, so let me know if you’re interested.
Now for the real showstoppers: the mini-models. We’re talking a 3B and a 1B version. I’ve heard mixed things about these, but from what I’ve seen, they’re pretty impressive for their size.
Matthew Berman ran the 1B model on Groq (not to be confused with Elon’s GROK), and it churned out over 2,000 tokens per second. It even nailed a working snake game on the first try – something GPT-4 struggled with at launch. And remember, GPT-4 is about 1,700 times bigger.
I put both models through my 20 questions test. The 3B model was scary good, keeping perfect track of previous questions and narrowing down to a specific monkey species when I just had ‘monkey’ in mind. The 1B had some issues, occasionally hallucinating answers I hadn’t given. But hey, it’s a tiny model – some trade-offs are expected.
The real kicker is how light these models are. My decidedly average computer ran the 1B model like a champ and handled the 3B with decent speed. Meta joked about not knowing what a thermostat would do with a language model, but I’m genuinely excited to see what developers come up with.
If you want to dig deeper into the SLM field, check out my post on small language models. It’s wild how fast this field is moving.
In my opinion, these new models are pretty cool. They’re not going to dethrone the big players, but they open up a lot of possibilities for lightweight AI applications. I’m particularly interested in seeing how the mini-models get used in edge computing and mobile devices.
What do you think about LLaMA 3.2? Are you planning to experiment with these models?