A
AI Explained·TechWhat the New ChatGPT 5.4 Means for the World
TL;DR
GPT-5.4 outperforms humans on 70.8% of white-collar tasks, but its high hallucination rate and uneven benchmarks mean professionals aren't obsolete yet.
Key Points
- 1.GDP-Val benchmark: GPT-5.4 beat human first attempts 70.8% of the time across 44 white-collar occupations (83% including ties), but tasks were self-contained and digital — not fully representative of real jobs.
- 2.Hallucination problem: GPT-5.4 has an 89% "BS rate" — when it gets answers wrong, it fabricates confident responses rather than admitting ignorance, making it riskier than GPT-5.3 Codex on accuracy-critical work.
- 3.Spiky, uneven performance: GPT-5.4 doubled scores on one ML benchmark (12% → 23%) but actually *underperformed* GPT-5.2 on OpenAI's internal engineering bottleneck benchmark, highlighting how AI progress remains jagged across domains.
- 4.Closing the coding loop: GPT-5.4's Codex can one-shot complex coding tasks (e.g., animated league tables with live web searches), blurring the line between developers and non-developers and threatening junior coding roles.
- 5.Anthropic vs. OpenAI military contract: The Pentagon dropped Anthropic after they refused to allow autonomous warfare and domestic surveillance use cases; OpenAI accepted with a Palanteer "safety layer" that Anthropic CEO Dario Amodei called "80% safety theater."
- 6.Claude already used for targeting: The Washington Post revealed Claude, inside a Palanteer system, has suggested hundreds of military targets in Iran with precise coordinates — potentially within Anthropic's own terms of service during a permitted 6-month window.
- 7.Revenue acceleration continues regardless: Despite the controversy, Anthropic's revenue has doubled in 2025 alone — from $9B to $20B — with 10x annual growth projected for three to four consecutive years.
Life's too short for long videos.
Summarize any YouTube video in seconds.
Quit Yapping — Try it Free →