From Deepmind's Website
Created using Ideogram 2.0 Turbo with the prompt, "From Deepmind's Website"

Genie 2: Google DeepMind Builds AI That Makes Virtual Worlds From Images

Google DeepMind just released Genie 2, a model that can turn images into playable virtual worlds. It takes in any image and outputs a world you can explore for up to a minute with keyboard and mouse controls. The worlds look good too – they run at 720p resolution.

I’m most excited about what this means for training AI agents. The SIMA team at DeepMind already tested this by having their AI follow basic instructions in Genie 2’s generated worlds. They told it to go to colored doors and it worked perfectly. This opens up endless possibilities for training AI in an unlimited variety of environments.

Genie 2 works with all kinds of inputs. You can feed it real photos, concept art, or synthetic images made by other AI models like Imagen 3. The model then creates consistent physics and interactions that let you move around and explore.

This is a big step forward from previous world models. Most can only handle specific types of scenes or have very limited interaction. Genie 2 is flexible enough to work with pretty much any visual input while maintaining high quality and consistency.

Right now, the main limitation is that worlds only last for about a minute. But even with that constraint, this technology could transform how we develop and test AI systems. Instead of manually building training environments, we could generate infinite variations automatically.

It’s worth noting that this builds on some trends we’ve seen recently in AI world models. Just last month, Midjourney announced their own world model capabilities, though with different technical approaches.

The team behind Genie 2 includes researchers from multiple DeepMind groups – the core Genie team, Generalist Agents, and SIMA. This kind of cross-team collaboration tends to produce the most interesting AI advances.

I expect we’ll see many more developments in this space as companies race to build better world models. The ability to automatically generate interactive 3D environments from 2D images is too valuable to ignore.