Quit Yapping
I don't really like GPT-5.5…
27:09
Watch on YouTube ↗
T
Theo - t3.gg·Tech

I don't really like GPT-5.5…

TL;DR

GPT-5.5 is the smartest model yet but frustrates with lazy intent-following, context lock-in issues, and a steep 2x price hike.

Key Points

  • 1.GPT-5.5 comes with a significant price hike. At $5/million tokens in and $30/million tokens out, it's 2x the cost of GPT-5.4 and ~20% more than Claude Opus 4.7, though higher token efficiency partially offsets this.
  • 2.The model achieves state-of-the-art benchmark results. It scored 82.7% on Terminal Bench (up from 75.1%), 73% on expert SWE-bench (up from 68.5%), and tops the Artificial Analysis Intelligence Index, trained on Nvidia GB200 NVL72 systems.
  • 3.Token efficiency is dramatically improved across reasoning tiers. GPT-5.5 x-high used 75M tokens versus 5.4's ~130M+; the medium tier used only 22M tokens while performing near 5.4 x-high intelligence levels.
  • 4.The model's core problem is lazy intent-following. It honors requests 'just barely,' acts like a developer closing Jira tickets rather than fully solving problems, and requires more explicit upfront prompting than any previous model.
  • 5.Context lock-in is a major practical frustration. Once incorrect information enters the context window, the model cannot be prompted out of it — bad habits persist even when told to stop, forcing frequent new thread creation.
  • 6.GPT-5.5 Pro is genuinely impressive for hard reasoning tasks. The reviewer used it to solve three Defcon cipher puzzles that had been unsolved for 5–10 years, and it made meaningful progress on the long-running Noa iMessage challenge.
  • 7.The 3D and frontend coding capabilities show promise but flaws. In a Fish Slop game test, 5.5 outperformed 5.4 and Opus visually and functionally, but repeatedly failed to fully honor 3D intent and defaulted to card-heavy UI patterns.
  • 8.The reviewer recommends treating GPT-5.5 as a fundamentally different model. Old prompts and workflows won't transfer well; success requires more detailed upfront context, frequent new threads, and pre-researching resources before starting tasks.

Life's too short for long videos.

Summarize any YouTube video in seconds.

Quit Yapping — Try it Free →