MCP in Production -- the MCP server book by Jeff Yaw (Yaw Labs), PDF + EPUB

Fourteen MCP servers in production at Yaw Labs. tailscale-mcp taught us that the LLM-vs-human identity question has no good default. aws-mcp taught us that a tool list can grow to 40,000 tokens before anyone notices. npmjs-mcp taught us that "auth" is four different problems wearing the same coat. lemonsqueezy-mcp taught us that errors a model can act on are a different art form from errors a human reads.

This book is what we wrote down between server #2 and server #14, when the same surprises kept landing in different forms and we got tired of solving them from scratch.

Twelve chapters on what the spec doesn't tell you. The protocol is the easy part; the hard part is the one you only learn by running these things in production for six months and watching what breaks.

Free with a Token Limit News subscription. Your download links appear right here -- they stay live for about an hour. Unsubscribe anytime.

PDF + EPUB. Free updates as the spec evolves. Read a sample chapter.

Table of contents

What this MCP server book teaches you to fix

Each of these is a thing one of the @yawlabs/* servers has actually hit -- not a hypothetical. The book gives you the schema change, throw discipline, or hosting decision that catches the next one before it reaches a customer.

  • The 40,000-token tool list. Your aws-mcp ships 200 tools; the model degrades on selection; the trace shows it grabbing the wrong one. Chapter 5 is the longest chapter for a reason. Chapter 5.
  • Auth that doesn't fit the LLM-vs-human pattern. Your upstream API expects a human at a browser; the agent isn't a human. The four common patterns and what to do when none of them fits. Chapter 4.
  • Tools that fight each other instead of composing. Output-shape-as-input-shape. Pagination that survives a non-deterministic caller. List-then-detail done right. Cross-server composition without the joins falling apart. Chapter 6.
  • Errors the model can't act on. A 500 with a stack trace is a useless signal to a model. Throw discipline + trigger phrases + transient-vs-terminal retry, with a six-axis grading rubric. Chapter 7.
  • E2E tests that pass on Tuesday and fail on Thursday. The harness pattern that makes testing a probabilistic consumer tolerable, instead of accepting that "non-deterministic" means "untested." Chapter 8.
  • The idle server that's burning more than the active one. Six hosting options, an honest comparison of managed MCP platforms, container packaging, reproducible builds off your laptop. Chapter 9.
  • The security review you didn't know was coming. The threat model, the mitigations, the questions a competent reviewer will ask, and a checklist you can run yourself before they do. Chapter 10.

Chapter-by-chapter: how to build production MCP servers

Part 1 - Foundations

  • Chapter 1. Why MCP exists - the three primitives, the two transports, what MCP solved that Plugins and function-calling didn't. Free preview.
  • Chapter 2. Anatomy of a server - stdio adapter, HTTP service, in-process embedded; the four architectural shapes a real server takes.
  • Chapter 3. Building your first production-grade server - worked example, npm init through npm publish. Every chapter from here refers back.

Part 2 - Surface

  • Chapter 4. Auth, secrets, and identity flow - the four common patterns, the LLM-vs-human identity problem, what to do when your upstream API doesn't fit.
  • Chapter 5. Schema design, or how to not poison the model - the longest chapter. Naming, parameter modeling, the tool-list size problem (40,000 tokens is not OK).
  • Chapter 6. Tools that compose - output-shape-as-input-shape, pagination, list-then-detail, idempotency, cross-server composition.
  • Chapter 7. Error handling that the model can act on - throw discipline, trigger phrases, transient vs terminal retry, a six-axis grading rubric.

Part 3 - Lifecycle

  • Chapter 8. Testing a probabilistic consumer - unit, integration, and end-to-end testing patterns when the consumer is non-deterministic; the harness pattern that makes E2E tolerable.
  • Chapter 9. Hosting, scaling, and not going broke on idle - six realistic options, an honest comparison of managed MCP platforms, container packaging, reproducible builds off your laptop.
  • Chapter 10. Security review survival - the threat model, the mitigations, the questions a competent reviewer will ask, a checklist you can run yourself.

Part 4 - In practice

  • Chapter 11. Case studies from the @yawlabs portfolio - tailscale-mcp (the first one), npmjs-mcp (auth-shaped), aws-mcp (schema-shaped), lemonsqueezy-mcp (errors and money). Architecture, surprises, what a v2 would look like.
  • Chapter 12. What's next, and what to bet on - in-flight spec changes, ecosystem gaps, and the bets I'd make today.

Who this MCP book is for

You ship code for a living. You have read the MCP spec and shipped at least one server. You know what tools/list and notifications/initialized are without looking them up. You want to know why your aws-mcp tool list is 40,000 tokens and what to do about it.

You're somewhere between mid and senior on the IC ladder, or a tech lead deciding how to invest your team's MCP work.

Not for: spec walkthroughs (modelcontextprotocol.io does that better), "what is MCP" introductions, or vendor-neutral tool surveys.

Companion books in the Yaw Labs Production Series

MCP in Production is Volume I of the Yaw Labs Production Series - the builder's view of Model Context Protocol. Volume II, Claude Code in Production, is the operator's view: running the agent that calls these servers. Volume III, Semantic Search in Production, is the retrieval substrate the agent reaches into. Volume IV, A2A in Production, is what happens when one agent becomes a fleet.

What's in the box

MCP in Production: FAQ

Does this MCP book cover Anthropic's MCP spec, or only Claude-specific behavior?

The spec. MCP is a multi-client protocol and the book treats it that way -- the server you ship runs against Claude, Cursor, Cline, and anything else that speaks the protocol. Client-specific quirks (Claude's tool-list size sensitivity, Cursor's transport preferences) are noted at the point where they constrain a server-side decision.

What if the MCP spec changes?

Each chapter pins the spec version it was written against, and updates ship as the spec moves. The protocol has been moving steadily; the disciplines (schema design, throw discipline, the four auth patterns, the testing harness) survive minor-version churn. When a load-bearing change lands, the affected chapter gets a revision and you get the update.

Do I need to know Claude Code or Cursor first?

No. This book is the server-side view -- you're shipping the tools, not operating the agent that calls them. Volume II, Claude Code in Production, covers the operator's perspective if you want both halves.

Do I need to ship a public MCP server to benefit?

No. Local and internal MCP servers are the larger use case -- the LLM-vs-human auth question, schema design, throw discipline, and testing patterns apply identically whether the server runs on your laptop, in your VPC, or as a published @yawlabs/* package. The hosting chapter covers all three deployment shapes.

How do the companion-repo invites work?

There are no invites -- the companion repo is public, so just clone it. Starter code, exercises, and worked solutions live at chapter-N-final tags for each hands-on chapter. No GitHub account or access request needed.

Get MCP in Production

Twelve chapters. PDF + EPUB. Free updates. Free with a Token Limit News signup.

Free with a Token Limit News subscription. Your download links appear right here -- they stay live for about an hour. Unsubscribe anytime.

Companion volumes: Claude Code in Production, Semantic Search in Production, and A2A in Production. Built on the same patterns as the Yaw MCP CLI (@yawlabs/mcp).