Every MCP tool definition gets loaded into the LLM’s context window at the start of a conversation. This is not free. Each tool costs tokens, and tokens cost money, latency, and attention.

Most developers do not think about this because the cost is invisible. You add an MCP server, it works, and you move on. But the cumulative weight of tool definitions is one of the largest hidden costs in AI agent workflows.

The math

A typical MCP tool definition includes a name, a description, and an input schema. In tokens, that is roughly:

Now multiply. A typical MCP setup for a power user:

MCP servers and their tool counts:
─────────────────────────────────
  GitHub           28 tools
  Slack            15 tools
  PostgreSQL        8 tools
  Filesystem       11 tools
  Brave Search      3 tools
  Sentry            9 tools
  Linear           22 tools
  Notion           14 tools
  Docker            7 tools
  Kubernetes       19 tools
─────────────────────────────────
  Total:          136 tools

At 300 tokens per tool (a conservative average), that is 40,800 tokens of tool definitions loaded into every conversation. Before you have typed a single word.

Context tokens consumed by MCP servers before the user types anythingTokens loaded at conversation startGitHub - 28 tools - 8,400 tokensLinear - 22 tools - 6,600 tokensKubernetes - 19 tools - 5,700 tokensSlack - 15 tools - 4,500 tokensNotion - 14 tools - 4,200 tokensFilesystem - 11 tools - 3,300 tokens5 other servers - 36 tools - 8,100 tokensTotal: 136 tools, ~40,800 tokens, ~20% of a 200k context window
A typical MCP-heavy setup consumes ~40k tokens before the user says a word. At $3 / 1M input tokens on a frontier model, that’s about $0.12 of overhead per conversation - ~$130 / person / month at 100 conversations a day.

The dollar cost

Token pricing varies by model, but let us use Claude Sonnet as a representative example. At $3 per million input tokens:

40,800 tokens × $3/1M = $0.12 per conversation

At 50 conversations/day:
  $0.12 × 50 = $6.12/day
  $6.12 × 22 workdays = $134.64/month

For a team of 10:
  $134.64 × 10 = $1,346.40/month

Just on tool definitions. Before any actual work.

This is the floor. Every message in a multi-turn conversation re-sends the tool definitions as part of the context. A 20-message conversation with 40K tokens of tools sends those 40K tokens 20 times. The real cost is higher.

The latency cost

More input tokens means more time to first token. The LLM has to process every tool definition before it starts generating a response. The difference between 5K tokens of context and 45K tokens of context is noticeable - typically 200-500ms of additional latency on the first response.

For interactive use this is annoying. For agent loops where the LLM makes dozens of sequential tool calls, it compounds. Each step in the loop pays the latency tax on the full tool set.

The attention cost

This is the cost nobody talks about, and it might be the most important one.

LLMs have finite attention. The more tools in context, the worse the model is at selecting the right one. Research on LLM tool selection shows degradation starting around 20-30 tools, with significant drops in accuracy above 50. The model starts hallucinating tool names, confusing similar tools, and selecting tools that partially match instead of the correct one.

This connects to the broader one-MCP-server argument - the research is clear: more context does not always mean better performance. Past a threshold, additional tool definitions actively degrade the quality of the agent’s decisions.

Why this happens

The root cause is the MCP client model: all configured servers load all their tools on startup. There is no concept of lazy loading, conditional activation, or relevance filtering at the protocol level.

From the spec’s perspective, this makes sense. The protocol defines how servers expose tools, not how clients manage them. But from an operational perspective, it means every MCP server you add increases the baseline cost and decreases the baseline quality of every conversation.

The activate-on-demand pattern

The fix is to not load tools until you need them. Instead of connecting to every MCP server at startup, start with a minimal set and activate servers during the conversation based on what the task requires.

This is the core idea behind mcp.hosting. Your client starts with three meta-tools: discover (list available servers), activate (connect to a server and load its tools), and deactivate (disconnect and remove its tools from context).

A typical conversation flow:

  1. Start with 3 tools in context (discover, activate, deactivate) = ~600 tokens
  2. User asks about a GitHub issue. LLM activates the GitHub server. Now 31 tools in context.
  3. User asks to update a Linear ticket. LLM activates Linear. Now 53 tools.
  4. GitHub work is done. LLM deactivates GitHub. Back to 25 tools.

Peak context: 53 tools (~16K tokens). Without mcp.hosting: 136 tools (~41K tokens). The user never configures more than one server locally. The LLM handles activation based on what the conversation needs.

Measuring the impact

You cannot optimize what you cannot measure. Most developers have no idea how many tokens their tool definitions consume because the cost is hidden in the API bill.

Your LLM provider’s usage dashboard is the right place to start. Anthropic and OpenAI both expose input and output token breakdowns per request; once you can see that 60% of your input tokens are tool schemas, the motivation to optimize becomes concrete.

Practical recommendations

  1. Audit your tool count. Run discover or check your client’s tool list. If you have more than 30 tools loaded at startup, you are paying a meaningful tax.
  2. Write tight schemas. A tool description does not need to be a paragraph. A parameter description does not need to repeat the parameter name. Every token you trim from a tool definition is multiplied by every message in every conversation.
  3. Use activate-on-demand. mcp.hosting implements this pattern. Start lean, activate what you need, deactivate when you are done.
  4. Split large servers. If you have an MCP server with 30+ tools, consider splitting it into focused servers that can be activated independently. A “github-issues” server and a “github-repos” server is better than one monolithic GitHub server when you only need issue tools.
  5. Measure your spend. Use your provider’s usage dashboard to understand where your tokens are going. Tool definitions are often the largest single category of input tokens for agent-heavy workflows.

The MCP ecosystem is growing fast. The number of available servers doubles roughly every quarter. Without active management, your tool context will grow faster than your actual usage - paying more for worse results. Manage it now, before it manages your budget.


Jeff Yaw, Yaw Labs. Follow along at tokenlimit.news for weekly notes on AI infrastructure.