What the New ChatGPT 5.4 Means for the World

TL;DR

GPT-5.4 outperforms humans on 70.8% of white-collar tasks, but its high hallucination rate and uneven benchmarks mean professionals aren't obsolete yet.

Key Points

1.GDP-Val benchmark: GPT-5.4 beat human first attempts 70.8% of the time across 44 white-collar occupations (83% including ties), but tasks were self-contained and digital — not fully representative of real jobs.
2.Hallucination problem: GPT-5.4 has an 89% "BS rate" — when it gets answers wrong, it fabricates confident responses rather than admitting ignorance, making it riskier than GPT-5.3 Codex on accuracy-critical work.
3.Spiky, uneven performance: GPT-5.4 doubled scores on one ML benchmark (12% → 23%) but actually *underperformed* GPT-5.2 on OpenAI's internal engineering bottleneck benchmark, highlighting how AI progress remains jagged across domains.
4.Closing the coding loop: GPT-5.4's Codex can one-shot complex coding tasks (e.g., animated league tables with live web searches), blurring the line between developers and non-developers and threatening junior coding roles.
5.Anthropic vs. OpenAI military contract: The Pentagon dropped Anthropic after they refused to allow autonomous warfare and domestic surveillance use cases; OpenAI accepted with a Palanteer "safety layer" that Anthropic CEO Dario Amodei called "80% safety theater."
6.Claude already used for targeting: The Washington Post revealed Claude, inside a Palanteer system, has suggested hundreds of military targets in Iran with precise coordinates — potentially within Anthropic's own terms of service during a permitted 6-month window.
7.Revenue acceleration continues regardless: Despite the controversy, Anthropic's revenue has doubled in 2025 alone — from $9B to $20B — with 10x annual growth projected for three to four consecutive years.

Life's too short for long videos.

Summarize any YouTube video in seconds.

Quit Yapping — Try it Free →