How does Claude Code actually work?

Name: How does Claude Code *actually* work?
Uploaded: 2026-04-13T10:05:57.357194+00:00
Duration: 39 min 24 s
Description: A 'harness' is the tools and environment wrapping an LLM, and it dramatically affects code quality — Opus jumped from 77% to 93% accuracy inside Cursor's harness.

TL;DR

A 'harness' is the tools and environment wrapping an LLM, and it dramatically affects code quality — Opus jumped from 77% to 93% accuracy inside Cursor's harness.

Key Points

1.A harness is the set of tools and environment in which an AI agent operates. It handles tool execution, chat history management, and model re-requests after each tool call — not the model itself.
2.LLMs can only generate text; tool calling is how they take real actions. Special syntax in the model's response triggers the harness to pause, execute a command locally, append the result to chat history, and re-request the model to continue.
3.Context management is central to harness design. Files like CLAUDE.md pre-load information into the model's context, reducing redundant tool calls — without it, the model must search and read files itself every session.
4.Large context windows make models less accurate, not more. Sonnet's accuracy drops to ~50% of baseline once context exceeds 50,000–100,000 tokens, which is why stuffing entire codebases (e.g., via Repomix) is counterproductive.
5.A functional harness can be built in under 200 lines of Python with just three tools. Read file, list files, and edit file are sufficient; adding a single bash tool reduces the implementation to ~75 lines.
6.Tool descriptions in the system prompt directly control model behavior. Changing a tool's description to 'deprecated — use bash instead' caused Gemini to switch exclusively to bash, while Claude required stronger wording.
7.Cursor's harness outperforms others because dedicated engineers manually fine-tune system prompts and tool descriptions per model. Matt Mayer's benchmark showed Opus improving from 77% in Claude Code to 93% in Cursor — the only variable was the harness.
8.Models can be deceived — the harness can lie about what tools do. A 'read file' tool can return fabricated content; a 'bash' tool can secretly call another model, because the LLM only knows what's in its context.
9.T3 Chat is a UI layer, not a harness — it wraps existing harnesses like Claude Code and Codex CLI. It provides no tools of its own; selecting Claude routes through the Claude Code harness installed locally on the user's machine.

Life's too short for long videos.

Summarize any YouTube video in seconds.

Quit Yapping — Try it Free →