A2A in Production cover

The moment a single-agent system becomes a two-agent system is where the engineering changes shape, not just the deployment topology. The temptation is to think of it as a small step. We have one agent. We're adding another. They'll talk via HTTP. Done.

The temptation is wrong. A two-agent system isn't a slightly-bigger one-agent system; it's the smallest possible distributed system. The fallacies of distributed computing -- the network is reliable, latency is zero, the topology doesn't change -- apply the same way they always have. The protocol is the easy part. Auth across hops, scope across agents, provenance through the telephone game, partial failure, observability that doesn't fall apart at the agent boundary -- these are the parts that cost you a quarter to discover and a year to retrofit.

Twelve chapters on multi-agent systems as distributed systems first and prompt-engineering second. Discipline-first, opinionated, war-stories. Early access -- all twelve chapters drafted and readable end to end as of v1.0.0-early.2; the early-access label remains in place through v1.0 while reader feedback shapes revisions.

Your GitHub username -- you'll be auto-invited to the private companion repo with starter code and worked solutions for every chapter's Try-this section. Optional. Skip if you don't have a GitHub account or want to add it later -- email contact@yaw.sh with your order ID and GitHub username and we'll send the invite.

Early access. PDF + EPUB. Free updates as chapters land. Secure checkout. Read Chapter 1 free.

What you'll fix

Every one of these is a failure mode that didn't exist in the single-agent system and shows up the moment you add a second agent. The book gives you the architectural pattern, not the plug-and-play library.

  • The auth question that has no good default. Specialist runs as the user? As itself with a user-context claim? As a service principal? Each answer leaks something different in production. Chapter 5.
  • Memory that quietly bleeds across tenants. Agent A wrote it for user X; agent B retrieves it while serving user Y. The schema didn't encode the choice; the leak only shows up after launch. Chapter 8.
  • The telephone game. Agent C reads agent B's summary of agent A's output and acts on a distorted version of the original. Provenance written into every event is the fix; recovering it after the fact is unrecoverable. Chapter 6.
  • The cost amplification you underestimated by an order of magnitude. One user request becomes N agent calls becomes M model calls. Each hop has its own context, its own tools, its own response to integrate. Chapter 11.
  • The trace that spans three agents and tells you nothing. Context didn't propagate; every multi-agent debug session begins by reconstructing what happened from logs that don't share trace ids. Chapter 10.
  • The cascading timeout meltdown. One specialist hangs; the request budget doesn't propagate; circuit breakers don't exist; the fleet falls over instead of degrading. Chapter 11.
  • The "specialists" that are really one agent that thinks it's three. Heavily overlapping capabilities, the same memory store, the same tools, the same auth context. You're paying distributed-systems costs to solve a single-agent problem. Chapter 4.

What's in the twelve chapters

Status: chapters marked [DRAFT] are finished and readable today; chapters marked [ADAPTED] are rewritten from the seeded Agent Memory in Production material into A2A-shape. All twelve chapters are drafted as of v1.0.0-early.2. The early-access label remains in place through v1.0 while reader feedback shapes revisions; buyers get every update at no extra cost.

Part 1 - Foundations

  • Chapter 1. Why A2A is its own discipline [DRAFT] - what "A2A in production" means and why the multi-agent shape changes the engineering, not just the deployment topology. Why "just call another agent's API" is the half-answer. Why more-tools-on-one-agent is the right answer for most cases and A2A is the right answer for a specific, identifiable subset. The discipline gap from local to distributed.
  • Chapter 2. The protocol landscape [DRAFT] - A2A as a protocol vs. A2A as a pattern. The Google-authored A2A spec; the alternatives (LangGraph hand-offs, OpenAI Swarm patterns, raw HTTP). Why MCP isn't A2A and why conflating them produces architectures that don't work. How to pick a protocol you can leave in place for two years.

Part 2 - Orchestration

  • Chapter 3. Router and supervisor patterns [DRAFT] - the two foundational orchestration shapes. Router: one agent dispatches to a specialist; the routing decision is the load-bearing surface. Supervisor: one agent owns the task, breaks it into subtasks, integrates results. Trade-offs, where each pattern earns its keep, and when to compose them.
  • Chapter 4. Swarm and blackboard patterns (and when not to use A2A) [DRAFT] - parallel agents with an aggregator; agents reading and writing a shared workspace with no central orchestrator. The chapter ends with the most under-discussed A2A topic: the cases where one bigger agent with more tools beats any multi-agent shape.

Part 3 - Trust between agents

  • Chapter 5. Auth across agent boundaries [DRAFT] - how user identity and service identity propagate through a fleet. OIDC end to end. Service-to-service auth between agents. "Specialist runs as the user" vs "specialist runs as itself with a user-context claim". Token scoping across hops. The leak modes you don't see until production.
  • Chapter 6. Write-permission, provenance, and the telephone game [DRAFT] - who's allowed to write what, on whose behalf, and how the trail survives the agent hop. Provenance as a first-class concern. The telephone-game failure mode where agent C reads agent B's summary of agent A's output and acts on a distorted version of the original.

Part 4 - State across agents

  • Chapter 7. Memory taxonomies in multi-agent systems [ADAPTED] - the four kinds of memory worth distinguishing (episodic, semantic, working, procedural) with the multi-agent dimensions added: which agent owns it, which agents may read it, what happens when agent B's working memory becomes agent C's episodic memory.
  • Chapter 8. Cross-agent and cross-tenant scoping [ADAPTED] - scoping in a fleet. Per-agent / per-user / per-tenant / cross-agent. The leakage modes that get worse in A2A. Architectural patterns that prevent them. Privacy and deletion-on-request across multiple agents. The audit-trail requirement.
  • Chapter 9. Federated memory architectures [ADAPTED] - comparative deep dive on memory architectures running in production today: Anthropic's memory tool, Claude Code's MEMORY.md, MemGPT/Letta, Mem0, DIY-on-Postgres. Each one viewed through the federation question: how does it handle a second agent reading or writing the same store?

Part 5 - Operations

  • Chapter 10. Observability across agent boundaries [DRAFT] - a single user request crossing three agents should produce one trace, not three. OpenTelemetry context propagation across A2A calls. The per-hop event log. Replay tooling that reconstructs a multi-agent conversation. Debugging a fleet at 11pm.
  • Chapter 11. Cost, latency, and partial failure [DRAFT] - per-request budget caps that propagate across hops. Circuit breakers between specialists. Timeout-and-fallback. Disagreement resolution. Designing the failure modes you can live with.

Part 6 - The future

  • Chapter 12. What's next [DRAFT] - the long-context-vs-delegation question (when does talking to another agent beat just doing it yourself with a million-token window?). The autonomy spectrum. Whether A2A becomes a platform primitive or stays a per-app concern. The bets I'd make starting a new multi-agent system today.

Who it's for

You have shipped at least one agent to production. You know what an LLM call looks like, you've integrated tools, you've handled at least one production incident with an agent in it. You're now standing at the moment where one agent is becoming two -- and you can feel that the second agent isn't a small step. You're somewhere between mid and senior on the IC ladder, or a tech lead who needs to make architectural calls about whether to add an agent or add a tool.

You don't need to know the A2A spec by heart. The book references it (and the alternatives) where they constrain decisions. You do need to know what a distributed-systems fallacy looks like -- "the network is reliable" should ring a bell. If it doesn't, get a copy of Designing Data-Intensive Applications first; this book treats the distributed-systems half of multi-agent systems as load-bearing prior knowledge.

Not for: tutorials on building your first agent (Vol II covers operating an agent; the provider docs cover the API basics), protocol specifications (the A2A and MCP specs are documents in their own right), vendor comparison spreadsheets (the orchestration tooling moves quarterly), or pure prompt-engineering treatments (prompts matter; they aren't where multi-agent systems break in production).

Companion volumes

A2A in Production is Volume IV of the Yaw Labs Production Series. Volume I, MCP in Production, is the protocol-and-server perspective on Model Context Protocol. Volume II, Claude Code in Production, is the operator's perspective on running an agent. Volume III, Semantic Search in Production, is the substrate the agent reaches into when it needs to find something. Volume IV is what happens when one agent becomes a fleet -- the discipline of multi-agent systems as their own engineering problem.

What's in the box

  • The book in PDF and EPUB, updated as new chapters land.
  • Free updates through v1.0 and beyond - every new chapter, every revision, no extra charge. The early-access price is the only price you pay.
  • Auto-invite to the private YawLabs/a2a-in-production-companion repo - starter code, exercises, and worked solutions at module-N-final tags, filling in by reader pull as chapters land.

FAQ

Why is this early access?

v1.0.0-early.1 launched with four chapters readable: Chapter 1 plus the three adapted state chapters in Part 4. v1.0.0-early.2 fills in the remaining eight (the protocol landscape, the orchestration shapes, the trust-between-agents chapters, observability, cost-latency-and-failure, and the closing forecasting chapter), so the book is now readable end to end. The early-access label remains in place through v1.0 -- the chapters are drafts, the companion repo is still being filled in tag by tag, and reader feedback is shaping revisions. The early-access price reflects that posture; you get every revision through v1.0 and beyond at no extra cost. If something in the book is wrong, unclear, or missing, file an issue against the companion repo -- early-access readers shape the priority more than the original outline does.

Do I need to have read Volumes I, II, or III?

No. Each volume in the Production Series stands on its own. Volume IV is about multi-agent systems; if you're not building MCP servers, running Claude Code, or shipping semantic search, you don't need the other three to make sense of this one. The chapters that draw on prior-volume material (e.g. observability building on Vol II's event-log work) summarize what they need at the point of use.

Is this a Google A2A spec book?

No. The Google-authored A2A spec is one protocol option among several, and Chapter 2 covers it alongside LangGraph hand-offs, OpenAI Swarm-style routing, and raw HTTP between agents. The book is opinionated about which one to pick for which shape of problem, but the disciplines (auth, scope, provenance, observability, partial failure) apply across protocol choices. If you swap A2A for LangGraph next year, the chapters still apply.

Is there a print edition?

Not yet, and especially not during early access -- the book is changing as chapters land, and a print edition only makes sense once v1.0 is feature-complete. The digital version is the canonical living one and keeps getting updates either way.

How do the companion-repo invites work?

You enter your GitHub username at checkout. The order webhook fires an invite to that user, adding you as a collaborator to the private companion repo. You should see an email from GitHub with the accept-invitation link within a few minutes. If you don't get the invite within an hour, email contact@yaw.sh with the order ID and the GitHub username you want invited.

Buy A2A in Production (early access)

Twelve chapters drafted and readable end to end. PDF + EPUB. Free updates through v1.0 and beyond. $39 one-time, secure checkout.

Your GitHub username -- you'll be auto-invited to the private companion repo with starter code and worked solutions for every chapter's Try-this section. Optional. Skip if you don't have a GitHub account or want to add it later -- email contact@yaw.sh with your order ID and GitHub username and we'll send the invite.

Companion volumes: MCP in Production, Claude Code in Production, and Semantic Search in Production. Together they cover the agentic-tooling stack: protocol, operator, retrieval substrate, and now the multi-agent fleet.