F
Fireship·TechGoogle just casually disrupted the open-source AI narrative…
TL;DR
Google's Gemma 4 is a truly Apache 2.0 licensed model small enough to run on a consumer GPU, matching much larger rivals via per-layer embeddings.
Key Points
- 1.Gemma 4 is the first truly free open-source model from a major FAANG company. Released under Apache 2.0 with no commercial restrictions, unlike Meta's Llama (which has leverage clauses) or OpenAI's OSS models, which are larger and less capable.
- 2.The size advantage is staggering compared to comparable open models. Gemma 4's 31B parameter version scores near Kimi K2.5 Thinking but requires only a 20GB download and runs at ~10 tokens/sec on a single RTX 4090, versus 600+ GB, 256GB RAM, and multiple H100s for Kimi.
- 3.Google's secret is per-layer embeddings, not just quantization. Models labeled 'E2B' or 'E4B' use per-layer embeddings, giving each transformer layer its own token cheat sheet so information is introduced exactly when needed, dramatically shrinking effective parameter counts without losing intelligence.
- 4.Google also quietly released TurboQuant, a novel quantization technique. It converts Cartesian coordinates to polar form and uses the Johnson-Lindenstrauss transform to compress weights down to single sign bits (±1) while preserving data distances, reducing memory bandwidth overhead beyond standard quantization trade-offs.
Life's too short for long videos.
Summarize any YouTube video in seconds.
Quit Yapping — Try it Free →