Quit Yapping
How does Claude Code *actually* work?
39:24
Watch on YouTube ↗
T
Theo - t3.gg·Tech

How does Claude Code *actually* work?

TL;DR

A 'harness' is the tools and environment wrapping an LLM, and it dramatically affects code quality — Opus jumped from 77% to 93% accuracy inside Cursor's harness.

Key Points

  • 1.A harness is the set of tools and environment in which an AI agent operates. It handles tool execution, chat history management, and model re-requests after each tool call — not the model itself.
  • 2.LLMs can only generate text; tool calling is how they take real actions. Special syntax in the model's response triggers the harness to pause, execute a command locally, append the result to chat history, and re-request the model to continue.
  • 3.Context management is central to harness design. Files like CLAUDE.md pre-load information into the model's context, reducing redundant tool calls — without it, the model must search and read files itself every session.
  • 4.Large context windows make models less accurate, not more. Sonnet's accuracy drops to ~50% of baseline once context exceeds 50,000–100,000 tokens, which is why stuffing entire codebases (e.g., via Repomix) is counterproductive.
  • 5.A functional harness can be built in under 200 lines of Python with just three tools. Read file, list files, and edit file are sufficient; adding a single bash tool reduces the implementation to ~75 lines.
  • 6.Tool descriptions in the system prompt directly control model behavior. Changing a tool's description to 'deprecated — use bash instead' caused Gemini to switch exclusively to bash, while Claude required stronger wording.
  • 7.Cursor's harness outperforms others because dedicated engineers manually fine-tune system prompts and tool descriptions per model. Matt Mayer's benchmark showed Opus improving from 77% in Claude Code to 93% in Cursor — the only variable was the harness.
  • 8.Models can be deceived — the harness can lie about what tools do. A 'read file' tool can return fabricated content; a 'bash' tool can secretly call another model, because the LLM only knows what's in its context.
  • 9.T3 Chat is a UI layer, not a harness — it wraps existing harnesses like Claude Code and Codex CLI. It provides no tools of its own; selecting Claude routes through the Claude Code harness installed locally on the user's machine.

Life's too short for long videos.

Summarize any YouTube video in seconds.

Quit Yapping — Try it Free →