Earlier this month, the Model Context Protocol was donated to the Linux Foundation. That moves MCP out of "interesting Anthropic spec" and into the same category as Kubernetes, OpenTelemetry, and Cloud Native Computing Foundation projects — industry-governed, vendor-neutral, here for the long haul.

The protocol is ready for serious infrastructure. The ecosystem isn't quite there yet.

Every week someone publishes a new MCP server on npm. It probably works. It might follow the spec. It might handle lifecycle transitions correctly. It might prevent prompt injection through a malicious resource URI. It might not. You find out by installing it and hoping.

That's not a trust layer. That's a guess.

88 Tests, 8 Categories, 30 Seconds

We built @yawlabs/mcp-compliance to answer the question mechanically. Paste a URL or command, get a letter grade in under a minute, drop a badge in your README.

The suite runs 88 tests grouped into 8 categories, each against the current MCP specification (2025-11-25). Required tests count for more than optional ones. Transport-gated tests only run on the transport your server actually declares. Capability-driven execution skips tests for features your server says it doesn't implement.

CategoryTestsWhat it checks
Transport16stdio framing, HTTP headers, SSE event format, session ID handling, Origin header enforcement
Lifecycle21initialize / initialized sequence, capability negotiation, shutdown behavior, reconnect semantics
Tools4tools/list, tools/call, input validation, list-changed notifications
Resources5Resource discovery, URI templates, subscribe/unsubscribe, update notifications
Prompts3Prompt listing, argument completion, prompt retrieval
Errors10JSON-RPC error codes, malformed request handling, unknown method behavior
Schema6JSON Schema validation for tool inputs, resource descriptors, prompt arguments
Security23Prompt injection in resources, tool description safety, URI scheme restrictions, authentication header handling

Security is the largest category deliberately. Half the reason a trust layer is necessary is that MCP servers ship tool descriptions and resource content directly into your LLM context — exactly the injection surface you can't afford to get wrong.

Try It Now

Paste a URL at mcp.hosting/compliance:

https://your-mcp-server.example.com/mcp

The tester opens a connection, runs through the 88 applicable tests (transport-gated, capability-gated), and returns a report:

Grade: B+ (86/100) Transport 16/16 ✓ PASS Lifecycle 19/21 ⚠ 2 optional failures Tools 4/4 ✓ PASS Resources 5/5 ✓ PASS Prompts 3/3 ✓ PASS Errors 9/10 ⚠ 1 required failure Schema 6/6 ✓ PASS Security 22/23 ⚠ 1 optional failure Required failures: ✗ errors-04 JSON-RPC error code for unknown method must be -32601 Received: -32600 (invalid request) Optional failures: ⚠ lifecycle-15 Server did not echo client's protocolVersion ⚠ lifecycle-17 initialized notification not acknowledged ⚠ security-22 Tool description contains suspicious instruction phrasing

Or run it locally against an npm package or a stdio binary:

npx @yawlabs/mcp-compliance test -- npx -y @yawlabs/tailscale-mcp

Drop a Badge in Your README

If your server grades A or B, put the badge in your README. It's a small trust signal that compounds — when an agent-builder is choosing between three MCP servers for the same job, the one with a compliance grade wins.

![MCP Compliance](https://mcp.hosting/badge/your-server-id.svg)

The badge updates automatically when you re-run compliance. If you regress, the badge reflects it. If you fix the regression, it's green again within a minute.

The Methodology Is Open

The testing rubric is published under CC BY 4.0 at github.com/YawLabs/mcp-compliance. Every test has a stable rule ID, documented severity (required vs. optional), scoring weight, and spec reference. The grade thresholds are published. The machine-readable rule catalog (mcp-compliance-rules.json) is part of the repo.

This is deliberate. A trust layer that's opaque is a second black box on top of the first one. If you don't like how we weight security tests vs. lifecycle tests, fork the rubric. If you think a rule is wrong, open an issue. The methodology is part of the public commons, not a competitive moat.

Capability-Driven, Transport-Gated

One thing worth explaining because it trips people up: not every MCP server implements every feature. Some don't do resources. Some are stdio-only and never touch HTTP. Grading a stdio-only server on HTTP-session-ID rules would be unfair.

So the tester asks the server what it supports (via the initialize response's capabilities object) and only runs tests for features the server declares. Transport-specific tests only run on the transport in use. If your server declares it doesn't implement resources, the 5 resource tests are skipped — not failed.

That means two servers can both get an A without implementing the same feature set. The grade reflects "does this server correctly implement what it claims to implement," not "does this server implement everything in the spec."

What This Is For

Three things:

For MCP server authors: it's a CI check. Run @yawlabs/mcp-compliance in your GitHub Actions pipeline, set a minimum grade threshold, fail the build if your server regresses. Same protection your code tests give you, applied to your spec compliance.

For MCP server consumers: it's a signal. An A grade from an open methodology, backed by a real test run, is a stronger trust primitive than a 5-star rating or a "verified" checkmark. You can inspect the report, see what passed and what didn't, and make your own call.

For the ecosystem: it's a coordination mechanism. When every MCP server has a comparable grade from the same rubric, server authors compete on measurable quality instead of marketing claims. That's how TLS interop got good (SSL Labs grades), how accessibility got better (axe/Lighthouse scores), how performance budgets became a thing (Web Vitals).

Where This Connects to mcp.hosting

The team that built the compliance suite also built mcp.hosting — a cloud config manager for people who actually use MCP servers day-to-day. Every server added to an mcp.hosting account is automatically graded, and the grade is visible in the dashboard before you install.

If you use Claude Code, Claude Desktop, Cursor, VS Code, or any MCP client with multiple servers, it's worth a look. One config, every client synced, smart routing to keep your context window from drowning in tool descriptions. Free for up to 3 servers.

But you don't need mcp.hosting to use the compliance tester. It's a standalone npm package, free and open source, and the methodology is documented enough that you could build your own tester against the same rubric if you wanted to. The point is the ecosystem needs a grade, not that any one tool needs to be the grader.

Run It on Your Server

npx @yawlabs/mcp-compliance test https://your-server.example.com/mcp

GitHub · npm · Compliance page

Published by Yaw Labs.

Related Articles

Interested in AI tools and developer workflows? Token Limit News is our weekly newsletter.