The Hidden Cost of MCP Servers (And When They're Worth It)

Cover Image for The Hidden Cost of MCP Servers (And When They're Worth It)
Mario Giancini
Mario Giancini
@MarioGiancini
Published:
Read Time: 6 min

I've spent the last couple of weeks or so rebuilding a "digital self" system with Cursor and Claude Code, something I've been wanting to do for a while now but could only realize with the latest AI models and tools. It includes commands for daily briefs, journaling, lead tracking, content aggregation from my "digital exhaust", and project management. Along the way, I fell into the same trap many developers hit: assuming more tools equals more capability. And specifically, MCPs.

MCP servers are powerful. They're also expensive in ways that aren't immediately obvious.

The Context Tax You're Already Paying

Every MCP server you configure loads its tool definitions at session startup. Not when you use them—before you type anything. This is the hidden tax:

ConfigurationToken Cost% of 200k Context
Built-in tools only~10-11k~5%
Single MCP (Playwright)+4-8k+2-4%
Moderate setup (gsuite)~5-10k~2.5-5%
Heavy setup (5+ servers)55-100k+27-50%+

I've seen many developers report 66,000+ tokens consumed before any conversation just from MCP tool definitions.

That's a third of your context window gone before you've asked a single question.

The "Dumb Zone" Problem

But startup cost is only half the story. Even with lean tool definitions, how you use the remaining context matters.

Dex Horthy's recent talk "No Vibes Allowed" introduces a concept he calls the "dumb zone"—roughly 40% into your context window, model performance degrades measurably. Not at capacity. At 40%.

Or as YK Sugi puts it in his Claude Code Masterclass: "AI context is like milk—best served fresh and condensed."

The more you fill up the context window, the worse outcomes you'll get. This explains why even light MCP setups can produce slop when conversations drag on. You're not running out of context—you're operating in diminishing returns territory.

Worse: corrections compound the problem. Telling Claude it's wrong repeatedly creates what Dex calls "trajectory poison"—the model sees a pattern of "I did something wrong, human corrected me" and learns to expect failure. The next most likely token becomes another mistake.

The fix isn't more context. It's intentional compaction:

  • Start fresh contexts instead of correcting endlessly
  • Use sub-agents to explore, then compress findings into markdown
  • Keep your main agent operating in the "smart zone" (under 40%)

This reframes /clear and /compact from "cleanup between tasks" to an active tool for maintaining output quality mid-session.

The Conditional Loading Problem

The intuitive solution is to load tools only when needed, but this doesn't exist natively yet. There's an open issue requesting this feature, but today you have solid workarounds:

The simplest approach: /mcp at session start. Type /mcp, disable the servers you don't need, done. The tool definitions vanish from context immediately. When you need Playwright later or another MCP server, run /mcp again and enable it. This is manual but takes 5 seconds and gives you full control.

For a more automated setup:

  1. Project-scoped .mcp.json — Tools only load in specific workspaces. Put Playwright in your test repo, calendar tools in your personal project. Zero overhead everywhere else.
  2. disallowedTools setting — Block specific tools permanently (requires restart to toggle)

The /mcp command (added in v2.0.10) is underrated. It's the fastest way to reclaim context budget without restarting your session or restructuring your config.

The 80/20 of Claude Code Tooling

After building dozens of custom commands and evaluating multiple MCP options, here's what actually delivers value:

What works (20% effort, 80% results):

  • A lean CLAUDE.md file (~2-3k tokens)—but beware: static docs get stale fast
  • Custom slash commands for repeatable workflows
  • /clear between unrelated tasks (and mid-task when context exceeds 40%)
  • CLI scripts that Claude can invoke via Bash
  • On-demand context generation (a /research command that compresses truth from code)

What often doesn't justify the overhead:

  • MCP servers for single-use-case tools
  • Meta-tooling (tools that generate other tools)
  • Sub-agents as fake personas (but sub-agents done right—explore, compress, return—are powerful)
  • "Expertise" skills loaded permanently into context
  • Bloated onboarding docs that become "lore" instead of truth

The insight that changed my approach: Claude Code is already designed to call shell commands. A 50-line TypeScript script in a tools/ or scripts/ directory that Claude invokes via Bash has zero context overhead until the moment it runs. An MCP server doing the same thing costs 4-10k tokens just sitting there.

When MCP Servers Are Worth It

MCP isn't always wrong. It's the right choice when:

  1. The tool requires persistent state — Browser sessions (Playwright), database connections
  2. You need bidirectional communication — Real-time updates, streaming responses
  3. The overhead is project-scoped — Only loads in relevant workspaces
  4. It's simple and used frequently — Daily tools can justify always-on cost

For my Google Calendar integration across 4+ accounts, using a custom script and Google APIs made sense because I can shape the behaviors better. This saved me a ton of tokens.

The Decision Framework

Before adding an MCP server, ask:

  1. Could a CLI script do this? If yes, prefer the script.
  2. Will I use this tool most sessions? If no, consider project-scoping.
  3. Does it require persistent state? If no, you probably don't need MCP.
  4. What's my current context budget? Check with /context.

The best Claude Code setup isn't the one with the most capabilities. The models and tools themselves are already highly capable. It's the one where every token of context overhead earns its keep.

The Bigger Picture

Tool selection is table stakes. The real leverage is context discipline.

AI can't nor shouldn't replace thinking. It can only amplify the thinking you have done—or the lack of thinking you have done.

A bad prompt becomes a bad plan. A bad plan becomes a hundred lines of bad code. The upstream clarity you bring through lean tools, intentional compaction, and staying in the smart zone determines everything downstream.

Don't outsource the thinking. Shape the context.


Did you find this article useful?

Subscribe to my newsletter for more content like this.


Related Posts

View All →
Cover Image for Migrating From Evernote To Obsidian

Migrating From Evernote To Obsidian

In migrating from Evernote to Obsidian for note taking, the major thing that changed was not the tool but my approach to knowledge capture.