T
Two Minute Papers·TechNVIDIA's New AI Changed Robotics Forever
TL;DR
NVIDIA's Sonic system uses a 42-million-parameter AI trained on 100 million motion frames to let robots mimic human movement via voice, video, text, or music.
Key Points
- 1.Sonic is a multimodal robot controller that accepts nearly any input. Users can command the robot via live video of their own movements, voice instructions, text, or even music — enabling tasks like lawn mowing, crawling, and expressive walking styles like 'stealthy' or 'injured.'
- 2.The system trained on 100 million frames of human motion without any manual action labels. It learns raw movement patterns autonomously and can transition smoothly between tasks without unnatural pauses, making the training pipeline remarkably hands-off.
- 3.A 'root trajectory spring model' prevents the robot from injuring itself during rapid commands. An exponential decay term dampens sudden inputs over time, stopping the robot from turning too fast or oscillating indefinitely — balancing safety with responsiveness.
- 4.Despite requiring 128 GPUs and 3 days to train, the final model runs on a smartphone with just 42 million parameters. All models are being released free and open to the public, making this accessible far beyond research labs.
- 5.The project is led by Jim Fan, who founded NVIDIA's humanoid robots lab just 2 years ago. The lab has produced rapid-fire breakthroughs, and the open-source release of Sonic represents a major step toward general-purpose robotic assistants.
Life's too short for long videos.
Summarize any YouTube video in seconds.
Quit Yapping — Try it Free →