What "A2A in production" means, why the multi-agent shape changes the engineering not just the deployment topology, and why the most under-discussed A2A topic is the cases where you shouldn't use A2A at all.
The moment a single-agent system becomes a two-agent system is where the engineering changes shape, not just the deployment topology. The temptation is to think of it as a small step. We have one agent. We're adding another. They'll talk via HTTP. Done. The temptation is wrong, and the rest of this chapter is the unpacking of why.
This book is about the discipline that takes you from "two agents that successfully exchange JSON" to "a fleet of agents that hand off work correctly under load and failure." Twelve chapters; six parts; the same posture as the other volumes in the Yaw Labs Production Series -- the gap between I built it and I run it in production is where the chapters live.
The half-answer is technically true. Two agents can exchange JSON over HTTP. The protocol can be A2A, or LangGraph hand-offs, or OpenAI Swarm-style routing, or raw fetch with a shared schema. Pick one; the wire works.
The wire is not where the problems live. The problems live in everything that isn't the wire:
Each of those bullets gets a chapter or part of one in this book. None of them is the protocol. The protocol is the easy part.
The two-agent system isn't a slightly-bigger one-agent system; it's the smallest possible distributed system. And the moment your system is distributed, you inherit a body of literature that's older than LLMs by decades.
The fallacies of distributed computing -- L. Peter Deutsch's list from the 90s -- are not new. The network is not reliable. Latency is not zero. Bandwidth is not infinite. The network is not secure. Topology doesn't change... etc. Every fallacy applies to a fleet of agents the same way it applies to a fleet of microservices. The agents are doing different work than microservices do, but the failure modes of the system that connects them are the same shape.
A specific consequence: a lot of A2A advice on the internet treats agent fleets as if they were prompt-engineering problems with a bigger prompt. That advice is missing the distributed-systems half of the problem entirely. When agent B times out waiting for agent A, the answer is not a better prompt for B; the answer is a circuit breaker, a fallback strategy, an observability surface that tells you which side timed out, and a budget enforcement that prevents the timeout from cascading into a meltdown. None of those are prompt engineering.
This book treats fleets as distributed systems first and prompt-engineering second. The distributed-systems disciplines -- timeout, retry, circuit-break, idempotency, trace-context propagation, scope enforcement, audit trail -- show up in every chapter. The prompt-engineering disciplines (when they matter) are layered on top.
The most under-discussed A2A topic is the cases where you shouldn't use A2A at all.
Most "agent" problems are better solved by giving one agent more capability than by adding a second agent that handles a slice. The reason is that every distributed-systems cost compounds at the agent boundary: every hop is one more place for context to drop, one more auth surface, one more failure mode, one more thing to observe. If a single agent with the right tools can do the work, it almost certainly produces a better system than a fleet of specialists.
The cases where one agent with more tools wins:
A useful heuristic, paraphrased from a conversation about microservice decomposition: the smallest valuable unit of agent decomposition is the one whose interface you can specify completely without referring to the implementation. If you can't say what agent B does without saying how agent A would call it, agent B isn't a separable unit; it's the second half of agent A's prompt with extra latency.
The cases where multi-agent genuinely earns its keep are smaller than the field's enthusiasm suggests, but they exist, and they share recognizable shapes.
Specialist context the orchestrator can't carry. A research agent needs domain context (which papers to weight, which sources to trust) that's specific to research; a writing agent needs different domain context for prose. If holding both contexts in one agent's prompt would either bloat the prompt unmanageably or force the agent to context-switch in ways that degrade output quality, separating into specialists earns the hop.
Tool surface that doesn't fit. A single agent with 50 tools degrades on tool selection -- the tool-list-size problem covered in MCP in Production (Vol I ch 5). Splitting the tools across specialists, with a router that picks the specialist, can recover quality the single-agent version was losing.
Different failure modes for different parts. If the research subtask can fail and you want the writing subtask to proceed with a partial answer, you need them to be separately retriable. In one agent, they share fate. In two agents, they don't.
Different scaling curves. If the research subtask is bursty (occasional 10x spikes) and the writing subtask is steady, you can scale them independently as separate agents. In one agent, you size for the worst case.
Different identities or trust boundaries. If part of the work needs to run as the user and part needs to run as a service principal, the separation is structural -- you can't do both in one agent identity. The fleet pattern reflects the trust topology you actually need.
The pattern across the cases that earn A2A: the multi-agent shape is reflecting something real in the problem. Different contexts, different tools, different failure boundaries, different scaling, different identities. When the shape of the problem matches the shape of the architecture, A2A is the right answer. When it doesn't, you're paying distributed-systems costs to solve a single-agent problem.
This book is the gap between I built two agents that successfully exchange JSON and I run a fleet of agents in production that handle real load, real users, and real failure. It assumes:
This book is not:
Part 1 (chapters 1-2) is foundations: this chapter, plus the protocol landscape (chapter 2) -- A2A spec, alternatives, why MCP isn't A2A, what each protocol commits you to.
Part 2 (chapters 3-4) is orchestration patterns: router and supervisor (chapter 3), swarm and blackboard, plus the cases where you shouldn't use A2A (chapter 4).
Part 3 (chapters 5-6) is trust between agents: auth across agent boundaries (chapter 5), write-permission and provenance and the telephone-game (chapter 6).
Part 4 (chapters 7-9) is state across agents: memory taxonomies in multi-agent systems (chapter 7), cross-agent and cross-tenant scoping (chapter 8), federated memory architectures (chapter 9). These three chapters are adaptations of material from Agent Memory in Production, the volume that briefly held the Vol IV slot before being decommissioned 2026-05-03; the multi-agent-shaped material seeded this part.
Part 5 (chapters 10-11) is operations: observability across agent boundaries (chapter 10), cost and latency and partial failure (chapter 11). The distributed-systems disciplines applied to fleets specifically.
Part 6 (chapter 12) is the future: the long-context-vs-delegation question, the autonomy spectrum, the bets I'd make starting a new fleet today.
This book ships as early access. v1.0.0-early.1 launched with the foundational chapter and the three adapted state chapters; v1.0.0-early.2 fills in the remaining eight, so all twelve chapters are drafted and readable end to end as of this revision. The early-access label remains in place through v1.0 -- the chapters are drafts, the companion repo (https://github.com/YawLabs/a2a-in-production-companion) is still being filled in tag by tag, and reader feedback is shaping revisions. Buyers get free updates through v1.0 and beyond -- the same companion repo, the same access, every revision as it lands. The early-access label is on the front matter, the README, the sales page, and the welcome email; the roadmap and the chapter -> companion-module map live in the OUTLINE.md.
If something in the book is wrong, unclear, or missing, file an issue against the companion repo -- early-access readers shape the priority of what gets drafted, sharpened, and exercised next more than the original outline does.
Chapter 2 is the protocol landscape: the A2A spec, what it commits you to, the alternatives (LangGraph hand-offs, OpenAI Swarm-style routing, raw HTTP between agents), and why MCP -- which sits next to A2A in vocabulary and gets confused with it -- isn't the same problem.
If you only take one thing from this chapter, take this. Multi-agent systems are distributed systems first and prompt-engineering systems second. Every chapter in this book treats them that way, and the discipline that separates the fleets that ship from the fleets that fall over is the willingness to apply distributed-systems thinking to a problem the field is still talking about as if it were a prompting problem.
Read the rest of the book
Eleven more chapters - the protocol landscape, router and supervisor patterns, swarm and blackboard, auth across agent boundaries, write-permission and provenance, memory taxonomies, cross-agent scoping, federated memory architectures, observability across hops, cost and latency under partial failure, and the future. Early access. $39 -- PDF + EPUB + free updates + companion repo.
Published by Yaw Labs.