I exploited Copilot and burned $46,000 (it cost $40)

TL;DR

GitHub Copilot's message-based billing lets a $40 plan consume $46,000 of inference because agentic AI runs hundreds of API calls per message.

Key Points

1.Copilot's message-based billing is fundamentally broken for agentic AI. When models were simple, one message equaled one API call. Agentic Copilot can chain hundreds or thousands of tool calls per message, making cost per message range from $0.01 to $30+.
2.The creator ran 50 simultaneous Copilot sessions on an unsolvable cryptography puzzle to maximize burn. Using a prompt.md file with a deliberately altered (unsolvable) cipher, he staggered 50 sessions to bypass rate limits, averaging $10 per message across 60 messages tested.
3.One single GPT-4.5 message running for 16 hours consumed 111 million input tokens and 1.6 million output tokens. Even with caching, that single message cost approximately $62 — meaning 1,500 such messages on a $40 plan equals ~$93,600 of inference.
4.The creator estimates his experimentation cost Microsoft over $46,000 total, against a $40/month comped plan. At under 5% of his 1,500-message quota he had already exceeded $550, with a goal of reaching $40,000 and confidence he surpassed it.
5.GitHub's pricing change starting June 1st shifts from message counts to token-based AI credits. Model multipliers will also increase dramatically — GPT-4 goes from 1x to 6x, Opus from 15x to 27x — making the current window the last chance to exploit the old model.
6.T3 Chat faced the same message-billing crisis, with individual users costing $200+ in days. After adding Claude Sonnet, it consumed 10x more cost than all other models combined despite only 1/3 of traffic, forcing a cap of 100 premium messages and nearly bankrupting the business.
7.Repo Mix — a tool that compresses codebases into XML for pasting into chat apps — is singled out as a major abuse vector. The creator estimates it cost his business approximately $500,000 by inflating input token counts per message on T3 Chat's flat message pricing.
8.The billing change is not a rug pull but a correction to a loophole left open too long. Every other AI coding tool (Cursor, Claude Code, Codex) already switched to rate-limit or token-based billing when agents launched; GitHub simply delayed while compute costs mounted and new signups were disabled.
9.The cryptography puzzle exploit was the mechanism: an unsolvable cipher forces models to run indefinitely. GPT-4.5 ran 81 minutes on puzzle one; puzzle two ran 157+ minutes; modified unsolvable versions pushed runs to 16+ hours, generating maximum tokens per single Copilot message.