The Hidden Cost of MCP Servers (And When They're Worth It)

I've spent the last couple of weeks or so rebuilding a "digital self" system with Cursor and Claude Code, something I've been wanting to do for a while now but could only realize with the latest AI models and tools. It includes commands for daily briefs, journaling, lead tracking, content aggregation from my "digital exhaust", and project management. Along the way, I fell into the same trap many developers hit: assuming more tools equals more capability. And specifically, MCPs.

MCP servers are powerful. They're also expensive in ways that aren't immediately obvious.

The Context Tax You're Already Paying

Every MCP server you configure loads its tool definitions at session startup. Not when you use them—before you type anything. This is the hidden tax:

Configuration	Token Cost	% of 200k Context
Built-in tools only	~10-11k	~5%
Single MCP (Playwright)	+4-8k	+2-4%
Moderate setup (gsuite)	~5-10k	~2.5-5%
Heavy setup (5+ servers)	55-100k+	27-50%+

I've seen many developers report 66,000+ tokens consumed before any conversation just from MCP tool definitions.

That's a third of your context window gone before you've asked a single question.

The "Dumb Zone" Problem

But startup cost is only half the story. Even with lean tool definitions, how you use the remaining context matters.

Dex Horthy's recent talk "No Vibes Allowed" introduces a concept he calls the "dumb zone"—roughly 40% into your context window, model performance degrades measurably. Not at capacity. At 40%.

Or as YK Sugi puts it in his Claude Code Masterclass: "AI context is like milk—best served fresh and condensed."

The more you fill up the context window, the worse outcomes you'll get. This explains why even light MCP setups can produce slop when conversations drag on. You're not running out of context—you're operating in diminishing returns territory.

Worse: corrections compound the problem. Telling Claude it's wrong repeatedly creates what Dex calls "trajectory poison"—the model sees a pattern of "I did something wrong, human corrected me" and learns to expect failure. The next most likely token becomes another mistake.

The fix isn't more context. It's intentional compaction:

Start fresh contexts instead of correcting endlessly
Use sub-agents to explore, then compress findings into markdown
Keep your main agent operating in the "smart zone" (under 40%)

This reframes /clear and /compact from "cleanup between tasks" to an active tool for maintaining output quality mid-session.

The Conditional Loading Problem

The intuitive solution is to load tools only when needed, but this doesn't exist natively yet. There's an open issue requesting this feature, but today you have solid workarounds:

The simplest approach: /mcp at session start. Type /mcp, disable the servers you don't need, done. The tool definitions vanish from context immediately. When you need Playwright later or another MCP server, run /mcp again and enable it. This is manual but takes 5 seconds and gives you full control.

For a more automated setup:

Project-scoped .mcp.json — Tools only load in specific workspaces. Put Playwright in your test repo, calendar tools in your personal project. Zero overhead everywhere else.
disallowedTools setting — Block specific tools permanently (requires restart to toggle)

The /mcp command (added in v2.0.10) is underrated. It's the fastest way to reclaim context budget without restarting your session or restructuring your config.

The 80/20 of Claude Code Tooling

After building dozens of custom commands and evaluating multiple MCP options, here's what actually delivers value:

What works (20% effort, 80% results):

A lean CLAUDE.md file (~2-3k tokens)—but beware: static docs get stale fast
Custom slash commands for repeatable workflows
/clear between unrelated tasks (and mid-task when context exceeds 40%)
CLI scripts that Claude can invoke via Bash
On-demand context generation (a /research command that compresses truth from code)

What often doesn't justify the overhead:

MCP servers for single-use-case tools
Meta-tooling (tools that generate other tools)
Sub-agents as fake personas (but sub-agents done right—explore, compress, return—are powerful)
"Expertise" skills loaded permanently into context
Bloated onboarding docs that become "lore" instead of truth

The insight that changed my approach: Claude Code is already designed to call shell commands. A 50-line TypeScript script in a tools/ or scripts/ directory that Claude invokes via Bash has zero context overhead until the moment it runs. An MCP server doing the same thing costs 4-10k tokens just sitting there.

When MCP Servers Are Worth It

MCP isn't always wrong. It's the right choice when:

The tool requires persistent state — Browser sessions (Playwright), database connections
You need bidirectional communication — Real-time updates, streaming responses
The overhead is project-scoped — Only loads in relevant workspaces
It's simple and used frequently — Daily tools can justify always-on cost

For my Google Calendar integration across 4+ accounts, using a custom script and Google APIs made sense because I can shape the behaviors better. This saved me a ton of tokens.

The Decision Framework

Before adding an MCP server, ask:

Could a CLI script do this? If yes, prefer the script.
Will I use this tool most sessions? If no, consider project-scoping.
Does it require persistent state? If no, you probably don't need MCP.
What's my current context budget? Check with /context.

The best Claude Code setup isn't the one with the most capabilities. The models and tools themselves are already highly capable. It's the one where every token of context overhead earns its keep.

The Bigger Picture

Tool selection is table stakes. The real leverage is context discipline.

AI can't nor shouldn't replace thinking. It can only amplify the thinking you have done—or the lack of thinking you have done.

A bad prompt becomes a bad plan. A bad plan becomes a hundred lines of bad code. The upstream clarity you bring through lean tools, intentional compaction, and staying in the smart zone determines everything downstream.

Don't outsource the thinking. Shape the context.

Update: January 2026 — Progressive Disclosure Is Now Default

Since publishing this article, Claude Code has fundamentally changed how it handles MCP tools.

December 2025: An experimental ENABLE_EXPERIMENTAL_MCP_CLI=true flag enabled on-demand tool discovery. A single MCP server that consumed ~17k tokens at startup showed minimal overhead until actually used.

January 2026: This is now the default behavior. MCP Tool Search auto-enables when tool definitions exceed 10% of your context window. Claude automatically defers loading and discovers tools on-demand via search.

The impact is dramatic: Users with many MCP servers report up to 95% reduction in startup token cost. A configuration that previously consumed 50k+ tokens at session start now shows minimal overhead.

This aligns perfectly with progressive disclosure—tools earn their context budget by being used, not by existing. The manual /mcp toggle still works for explicit control, but most users benefit automatically.

The bottom line: If you've been hesitant to add MCP servers due to context overhead, revisit. The cost calculus has fundamentally changed.

The Hidden Cost of MCP Servers (And When They're Worth It)

The Context Tax You're Already Paying

The "Dumb Zone" Problem

The Conditional Loading Problem

The 80/20 of Claude Code Tooling

When MCP Servers Are Worth It

The Decision Framework

The Bigger Picture

Update: January 2026 — Progressive Disclosure Is Now Default

Did you find this article useful?

Related Posts

The Checklist Manifesto for AI Commands: Using TodoWrite in Claude Code

The Agent Secrets Pattern: Giving AI Access Without Git Exposure