MCP burns tokens because every connected server injects full tool JSON schemas into the model context at session start — often 32,000–82,000 tokens before you send a single prompt. Fix it by running fewer servers, enabling MCP Tool Search in Claude Code, using dynamic tool registration, and choosing servers that return compact summaries instead of raw API dumps.
This is the #1 developer complaint about MCP on Hacker News in 2026 — and the fixes are architectural, not "use a smaller model."
Where the Tokens Go
On connect, the client sends the model:
- Tool names and descriptions
- Parameter schemas (sometimes enormous)
- Resource listings
- Prompt templates
A data vendor MCP with 30 tools can cost 50K+ tokens in schema alone (LangAlpha and similar projects document this pattern). Multiply by four servers and you have consumed most of a 200K window before coding.
Fix 1 — Cap Active Servers (3–5 Rule)
Production agent builders report measurably better tool selection with at most 3–5 MCP servers enabled per session. Disable everything not needed for the current task.
claude mcp list # audit what is connected
# remove or disable unused entries
Fix 2 — MCP Tool Search (Claude Code)
Claude Code's Tool Search loads relevant tool definitions on demand instead of front-loading every schema. Enable it when running more than a handful of tools — check claude mcp --help and current docs for flags.
Fix 3 — Dynamic Tool Registration
Jan 2026 MCP extensions let servers add/remove tools mid-session via "abilities" — the agent loads load_ability only when needed. Custom servers can group tools into collections instead of one flat list.
Fix 4 — Compact Server Responses
MCP authors should return:
- Summary line in tool result (what happened)
- Pointer to fetch details (resource URI, file path, paginated query)
Agents should not receive 10,000-row JSON blobs in context. Pipe large output to files and index locally (the pattern behind "98% context savings" tools on HN).
Fix 5 — Skills + Code Instead of Chatty Tools
For repetitive multi-step MCP calls, generate a typed script or skill once — the agent runs code locally and only sends the final answer to the model. See Claude Code Skills guide.
Fix 6 — Audit Weekly
Track session cost before/after MCP changes. Our token optimization post documents routing Opus vs Sonnet — combine model routing with MCP diet for maximum savings.
MCP vs CLI Wrappers
Built-in Bash tools can pipe gh, curl, and kubectl through subprocess hooks that index output without stuffing context. Third-party MCP responses often bypass those hooks — prefer CLI wrappers for read-heavy ops when no unique MCP capability is needed.
What Is MCP Token Bloat?
MCP token bloat refers to the excessive consumption of the model's context window by tool definitions, parameter schemas, and resource listings that MCP servers inject at session start. Instead of loading only relevant tools, the client dumps every schema into the prompt, often wasting 30–80% of the available context before any actual work begins.
How Does MCP Token Bloat Work?
When you connect an MCP server, the client sends the model a complete manifest: tool names, descriptions, JSON schemas for parameters, resource templates, and prompt examples. For a server with 30 tools, each schema can be hundreds of lines. The model must hold all of this in its context window, leaving less room for your instructions and data. The bloat compounds with every additional server.
Best Practices for Fixing MCP Token Bloat
- Limit active servers to 3–5 per session.
- Enable Tool Search in Claude Code to load schemas on demand.
- Use dynamic registration (MCP abilities) to add tools only when needed.
- Return compact summaries from tools instead of raw data dumps.
- Replace chatty MCP calls with local scripts (skills) for repetitive tasks.
- Audit token usage weekly using built-in counters or third-party tools.
How Much Does MCP Token Bloat Cost?
Token costs vary by model. For Claude Opus at $15 per million input tokens, a single MCP server consuming 50K tokens per session costs $0.75 per session just for the tool definitions. With four servers, that's $3 per session before any real work. Over a month of daily use, bloat can add hundreds of dollars to your bill. Using the fixes above can reduce token waste by 60–90%.
Is MCP Token Bloat Worth Fixing in 2026?
Absolutely. With context windows still finite (200K tokens typical) and token pricing non-trivial, every kilobyte of wasted context degrades model performance and increases cost. Developers who implement the fixes above report faster response times, better tool selection accuracy, and 40–70% lower token bills. The effort to optimize is minimal compared to the savings.
Keep Reading
- LLM Token Optimization in 2026 — model routing, caching, MCP audit
- Claude Code Complete Setup Guide — install, CLAUDE.md, MCP, skills
- Claude Code vs Cursor: Token Cost (2026) — dollar math on identical tasks
- AI Model Sprint — June 2026 — frontier model benchmarks
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing.