Deconstructing Nvidia’s Vera Rubin — The Successor To Blackwell That’s 10x More Efficient

TL;DR

Vera Rubin achieves 10x better performance-per-watt over Blackwell by redesigning every core chip and switching to 100% liquid cooling.

Key Points

1.New chips across the board: The Vera CPU delivers 2x performance-per-watt over Grace CPU; the Rubin GPU hits 50 petaflops (~2.5x improvement); NVLink bandwidth doubled from 1.8TB/s to 3.6TB/s, pushing 260TB/s across 72 GPUs.
2.HBM4 memory is the critical bottleneck: Each Rubin GPU uses 8 stacks of HBM4 from SK Hynix and Samsung, which is in short supply — Nvidia says it's managing risk through detailed supplier forecasts.
3.First fully liquid-cooled system: Vera Rubin compute trays have zero hoses, cables, or fans; each rack draws ~220kW but uses less water overall by eliminating evaporative cooling via a closed-loop system.
4.Massive but modular: One Rubin pod = 1,152 GPUs across 16 racks; 1.3 million components from 80+ suppliers across 20+ countries; a compute tray that took 2 hours to service on Blackwell now takes 5 minutes.
5.Price goes up, cost-per-token goes down: Analyst estimates put Vera Rubin racks at $3.5–$4M (up ~25% from Blackwell), but cost-per-token drops 10x, making the economics compelling for hyperscalers.
6.What's next — Kyber rack with Vera Rubin Ultra: A prototype 288-GPU rack (4x Rubin's GPUs) with 50% less weight increase achieved by removing cabling; expected to ship in 2027 with higher compute density and lower latency.

Life's too short for long videos.

Summarize any YouTube video in seconds.