There are now hundreds of MCP servers on npm and PyPI, and almost none of them have been tested against the actual specification. So we built @yawlabs/mcp-compliance - an 88-test suite that grades any MCP server (HTTP or stdio) against the 2025-11-25 spec in under 30 seconds.

npx -y @yawlabs/mcp-compliance test <url-or-stdio-command>

It is open source and free. Here is what we found running it across the reference implementations and a chunk of the public ecosystem.

The test suite

88 tests across 8 categories:

CategoryTestsWhat it covers
Transport16HTTP POST, content types (JSON/SSE), 202 Accepted on notifications, streaming
Lifecycle21initialize, protocol version, capabilities, MCP-Session-Id, ping, cancellation, progress
Tools4tools/list, schema shape, invocation, unknown-tool handling
Resources5resources/list, reading, templates, URI validation, subscribe
Prompts3prompts/list, prompts/get, pagination
Errors10JSON-RPC error codes, malformed input, missing params, unknown methods
Schema6Tool names, inputSchema shape, resource URIs, prompt argument names
Security23Auth & transport, input validation, tool-description injection, information disclosure, SSRF

Required tests are 70% of the score, optional 30%. Letter grades: A (90+), B (75+), C (60+), D (40+), F (<40). Capability-gated tests (tools, resources, prompts, subscribe, logging, completions) only run if the server declares the capability - no false failures for features the server never claimed.

The reference servers are clean

We pointed the suite at every @modelcontextprotocol/server-* package on npm. All five grade A (98–100%), zero required-test failures. The TypeScript SDK does its job: transport, lifecycle, and tool-schema basics work out of the box.

The interesting question is what happens once you leave the reference set.

What most servers get wrong

1. Error handling is the biggest gap

The spec says servers return JSON-RPC errors with proper codes for unknown methods and malformed requests. In practice, this is where most non-reference servers fall apart:

The SDKs could close most of this by returning -32601 for any unrecognized method by default, instead of leaving it to each server author.

2. Session affinity breaks behind load balancers

The spec defines MCP-Session-Id for session tracking but does not say what happens when a proxy or load balancer sits between client and server. Every stateful MCP deployment needs sticky routing, and every operator is figuring it out independently.

A shared key-value store keyed by MCP-Session-Id with a short TTL works fine. The point is that it should be documented once, not reinvented by every team running MCP servers in production.

3. SSE streaming through proxies is fragile

The Streamable HTTP transport works in direct connections. Put a reverse proxy in the middle and:

A heartbeat every ~15 seconds keeps connections alive through intermediate infrastructure. The spec mentions heartbeats but does not recommend a frequency.

4. Security is the long tail

The 23 security tests cover four areas: auth & transport, input validation, tool integrity, and information disclosure. Recurring failure modes:

5. SSRF is a real risk anywhere a URL is user-supplied

Any tool or platform that accepts a user-controlled URL (the fetch pattern, server routers, gateways, mcp.hosting itself) needs:

Checking only at configuration time leaves a DNS-rebinding window: an attacker points their domain at a public IP during validation, then swaps to a metadata IP before the first real request. The spec does not cover this yet - it probably should.

What we would suggest for the spec

  1. Document a recommended SSE heartbeat interval (15 seconds works in practice).
  2. Add an informational section on proxy/load balancer configuration - header forwarding, buffering, timeouts, backpressure.
  3. Recommend that SDKs return proper JSON-RPC errors for unknown methods by default.
  4. Standardize an optional discovery endpoint (/.well-known/mcp) so clients can check server metadata without a full initialize handshake.
  5. Add security guidance - token handling in errors, input size limits, tool-description sanitization, SSRF protection.

We have draft spec PRs for several of these in docs/spec-prs/.

Try it on your server

One command, no signup, ~30 seconds:

npx -y @yawlabs/mcp-compliance test https://your-server-url
# or stdio
npx -y @yawlabs/mcp-compliance test npx -y @your-org/your-mcp-server

To publish a result and get a README badge:

npx -y @yawlabs/mcp-compliance badge https://your-server-url

That prints a markdown badge snippet you can drop into your README.

If you also want to stop hand-editing MCP JSON configs across Claude Code, Cursor, and Claude Desktop - that is what mcp.hosting does with @yawlabs/mcph. Different post.


Jeff Yaw, Yaw Labs. Follow along at tokenlimit.news for weekly notes on AI infrastructure.