Close up photo of humanoid robot hand picking up a glass cup from wooden table. Soft natural lighting from window. Shot on RED camera with 85mm lens, shallow depth of field focused on fingertips touching glass. 8K resolution.
Created using Ideogram 2.0 Turbo with the prompt, "Close up photo of humanoid robot hand picking up a glass cup from wooden table. Soft natural lighting from window. Shot on RED camera with 85mm lens, shallow depth of field focused on fingertips touching glass. 8K resolution."

Helix: Figure’s New AI Model Takes Control of Humanoid Robots

Figure just announced Helix, their new AI model for controlling humanoid robots, marking a clear shift away from their OpenAI partnership. I think this is a smart move that puts them ahead in the robotics space.

Helix does something no other AI model has done before – it controls an entire humanoid upper body in real-time using natural language commands. We’re talking precise control of fingers, wrists, head, and torso all at once at 200Hz. That’s seriously impressive.

The model has two key parts working together:
– System 2: A large language model running at 7-9Hz that understands what you want the robot to do
– System 1: A fast control system running at 200Hz that turns those high-level commands into smooth robot movements

This split architecture is clever. By having a slower system for understanding and planning, and a faster system for actual control, Helix avoids the usual tradeoff between being smart and being quick.

What really stands out is how Helix handles new situations. In testing, Figure’s robots could pick up thousands of objects they’d never seen before, just by being asked in plain English. They even demonstrated two robots working together to put away groceries – each running the same Helix model but understanding their different roles through natural language instructions.

The training approach is efficient too. Helix learned all this from just 500 hours of demonstration data. That’s tiny compared to other AI systems, which often need 10-20x more data. And unlike most robot learning systems, Helix uses a single neural network for everything instead of needing special training for each task.

This matters because teaching robots new skills has always been painfully slow and expensive. You either needed PhD roboticists spending hours programming specific movements, or thousands of human demonstrations. Neither approach scales well to the huge variety of tasks we want robots to do in homes and workplaces.

By connecting language models to robot control, Helix points to a future where robots can learn new behaviors simply by being told what to do. The model translates our words directly into precise actions, bridging the gap between human concepts and robot movements.

Figure built this to run on standard embedded GPUs too, so it’s ready for real-world use right now. While other companies show impressive demos that need specialized hardware, Figure focused on making something practical.

The company plans to scale up Helix significantly, which should make it even more capable. Based on what they’ve achieved already with relatively modest training, I’m excited to see what happens when they push this approach further.

This is exactly the kind of practical innovation robotics needs – systems that can generalize to new situations while maintaining precise control. Figure has shown you don’t need massive training datasets or complex multi-stage learning to get robots to understand and act on natural language commands.