Quit Yapping
Google just casually disrupted the open-source AI narrative…
5:16
Watch on YouTube ↗
F
Fireship·Tech

Google just casually disrupted the open-source AI narrative…

TL;DR

Google's Gemma 4 is a truly Apache 2.0 licensed model small enough to run on a consumer GPU, matching much larger rivals via per-layer embeddings.

Key Points

  • 1.Gemma 4 is the first truly free open-source model from a major FAANG company. Released under Apache 2.0 with no commercial restrictions, unlike Meta's Llama (which has leverage clauses) or OpenAI's OSS models, which are larger and less capable.
  • 2.The size advantage is staggering compared to comparable open models. Gemma 4's 31B parameter version scores near Kimi K2.5 Thinking but requires only a 20GB download and runs at ~10 tokens/sec on a single RTX 4090, versus 600+ GB, 256GB RAM, and multiple H100s for Kimi.
  • 3.Google's secret is per-layer embeddings, not just quantization. Models labeled 'E2B' or 'E4B' use per-layer embeddings, giving each transformer layer its own token cheat sheet so information is introduced exactly when needed, dramatically shrinking effective parameter counts without losing intelligence.
  • 4.Google also quietly released TurboQuant, a novel quantization technique. It converts Cartesian coordinates to polar form and uses the Johnson-Lindenstrauss transform to compress weights down to single sign bits (±1) while preserving data distances, reducing memory bandwidth overhead beyond standard quantization trade-offs.

Life's too short for long videos.

Summarize any YouTube video in seconds.

Quit Yapping — Try it Free →