T
Two Minute Papers·TechDeepSeek V4 AI: Crushing The Competition
TL;DR
DeepSeek V4 matches frontier models at a fraction of the cost using three-layer KV-cache compression that cuts memory needs by 90%.
Key Points
- 1.DeepSeek V4 Pro matches billion-dollar frontier models while being open and free. It supports a 1 million token context window — roughly 1,500 pages of dense documentation — a feature previously exclusive to Google's Gemini flagship.
- 2.Three compression layers slash KV-cache memory by ~90%. Token-level compression summarizes text, Heavily Compressed Attention applies 128-to-1 structural compression, and Compressed Sparse Attention acts like an index — together reducing storage needs without major information loss.
- 3.The new Pro model uses 3x less compute than its predecessor; the Flash model uses 10x less. Despite heavy compression, recall benchmarks show the Pro version outperforms Google's Gemini 3.1 Pro on retrieving hidden facts in long contexts.
- 4.Pricing is 8–30x cheaper than Anthropic's Claude depending on discounts. The model can be self-hosted or accessed online, and the host notes that 'intelligence will soon get too cheap to meter.'
- 5.Key limitations include unimodal input (no images or audio), context-window degradation, and unexplained training stabilization techniques. The creators themselves admit they don't fully understand two techniques that stabilize training, though the paper is praised for its transparency.
Life's too short for long videos.
Summarize any YouTube video in seconds.
Quit Yapping — Try it Free →