Why A2A is its own discipline -- A2A in Production, Chapter 1 -- yaw

The two-agent problem

The moment a single-agent system becomes a two-agent system is where the engineering changes shape, not just the deployment topology. The temptation is to think of it as a small step. We have one agent. We're adding another. They'll talk via HTTP. Done. The temptation is wrong, and the rest of this chapter is the unpacking of why.

This book is about the discipline that takes you from "two agents that successfully exchange JSON" to "a fleet of agents that hand off work correctly under load and failure." Twelve chapters; six parts; the same posture as the other volumes in the Yaw Labs Production Series -- the gap between I built it and I run it in production is where the chapters live.

Why "just call another agent's API" is the half-answer

The half-answer is technically true. Two agents can exchange JSON over HTTP. The protocol can be A2A, or LangGraph hand-offs, or OpenAI Swarm-style routing, or raw fetch with a shared schema. Pick one; the wire works.

The wire is not where the problems live. The problems live in everything that isn't the wire:

Auth. A user request lands on agent A authenticated as the user. Agent A dispatches to agent B. Agent B does something on the user's behalf -- but in whose authority context? Agent A's? The user's? Agent B's own service identity? Each answer has consequences, none of them are right by default, and getting the wrong one shipped means either your specialists run with too much privilege (security problem) or too little (correctness problem).
Scope. Agent A wrote a memory while serving the user. Agent B retrieves that memory while serving the same user. Did agent B have standing to read it? If yes by default, you have a fleet that quietly shares everything across agents. If no by default, you have a fleet that re-derives the same context on every dispatch. Neither default is right for every memory, and the schema has to encode the choice (chapter 8 covers this in depth).
Provenance. A user complains: "your fleet recommended X, then changed its mind to Y, what's going on?" To answer, you need to know which agent gave which advice when, on whose behalf, in service of which task. Without provenance written into every event, you have a system you can't debug. The cost of writing provenance is small at the time of the write; the cost of not writing it is unrecoverable when you need it.
Cost. A user request that was one model call in the single-agent system becomes N agent calls becomes M model calls in the fleet. Each hop has its own context to assemble, its own tool calls to make, its own response to integrate. The cost amplification is real and easy to underestimate by an order of magnitude.
Latency. Same arithmetic as cost, on a different axis. Each hop adds round-trip time. A fleet that handles one user request via three serial dispatches replaces a 4-second response with a 15-second one, and the user notices.
Partial failure. One specialist hangs. One returns garbage. Two agents disagree on the answer. None of these failure modes existed in the single-agent system; all of them happen in fleets, and "the fleet returned an error" is rarely the right user-facing behavior for any of them.
Observability. A trace that used to start and end inside one agent now spans three. If your tracing doesn't propagate context across hops, every multi-agent debug session begins with reconstructing what happened from logs that don't share trace ids -- which is to say, an evening of correlation work for what was a 15-minute debug session in the single-agent system.

Each of those bullets gets a chapter or part of one in this book. None of them is the protocol. The protocol is the easy part.

What looks like a small step is a large step from local to distributed

The two-agent system isn't a slightly-bigger one-agent system; it's the smallest possible distributed system. And the moment your system is distributed, you inherit a body of literature that's older than LLMs by decades.

The fallacies of distributed computing -- L. Peter Deutsch's list from the 90s -- are not new. The network is not reliable. Latency is not zero. Bandwidth is not infinite. The network is not secure. Topology doesn't change... etc. Every fallacy applies to a fleet of agents the same way it applies to a fleet of microservices. The agents are doing different work than microservices do, but the failure modes of the system that connects them are the same shape.

A specific consequence: a lot of A2A advice on the internet treats agent fleets as if they were prompt-engineering problems with a bigger prompt. That advice is missing the distributed-systems half of the problem entirely. When agent B times out waiting for agent A, the answer is not a better prompt for B; the answer is a circuit breaker, a fallback strategy, an observability surface that tells you which side timed out, and a budget enforcement that prevents the timeout from cascading into a meltdown. None of those are prompt engineering.

This book treats fleets as distributed systems first and prompt-engineering second. The distributed-systems disciplines -- timeout, retry, circuit-break, idempotency, trace-context propagation, scope enforcement, audit trail -- show up in every chapter. The prompt-engineering disciplines (when they matter) are layered on top.

Why most agents are better with more tools, not more agents

The most under-discussed A2A topic is the cases where you shouldn't use A2A at all.

Most "agent" problems are better solved by giving one agent more capability than by adding a second agent that handles a slice. The reason is that every distributed-systems cost compounds at the agent boundary: every hop is one more place for context to drop, one more auth surface, one more failure mode, one more thing to observe. If a single agent with the right tools can do the work, it almost certainly produces a better system than a fleet of specialists.

The cases where one agent with more tools wins:

The agent has all the context to make the decision. If routing to a specialist requires the router to already know enough to do the work, the routing is a distraction. Drop it.
The "specialists" have heavily overlapping capabilities. If your three specialists each need read access to the same memory store, the same tools, the same auth context -- you have one agent that thinks it's three.
The latency budget can't absorb a hop. Some user-facing tasks need sub-second responses. Adding an agent hop to a 600ms budget is a non-starter.
The cost budget can't absorb a hop. Same arithmetic on cost.
The team isn't ready to operate a distributed system. Multi-agent systems require multi-agent observability, which requires multi-agent tracing, which requires the discipline to instrument it. If the team doesn't have that today, A2A makes the operational story worse, not better.

A useful heuristic, paraphrased from a conversation about microservice decomposition: the smallest valuable unit of agent decomposition is the one whose interface you can specify completely without referring to the implementation. If you can't say what agent B does without saying how agent A would call it, agent B isn't a separable unit; it's the second half of agent A's prompt with extra latency.

When A2A is the right answer

The cases where multi-agent genuinely earns its keep are smaller than the field's enthusiasm suggests, but they exist, and they share recognizable shapes.

Specialist context the orchestrator can't carry. A research agent needs domain context (which papers to weight, which sources to trust) that's specific to research; a writing agent needs different domain context for prose. If holding both contexts in one agent's prompt would either bloat the prompt unmanageably or force the agent to context-switch in ways that degrade output quality, separating into specialists earns the hop.

Tool surface that doesn't fit. A single agent with 50 tools degrades on tool selection -- the tool-list-size problem covered in MCP in Production (Vol I ch 5). Splitting the tools across specialists, with a router that picks the specialist, can recover quality the single-agent version was losing.

Different failure modes for different parts. If the research subtask can fail and you want the writing subtask to proceed with a partial answer, you need them to be separately retriable. In one agent, they share fate. In two agents, they don't.

Different scaling curves. If the research subtask is bursty (occasional 10x spikes) and the writing subtask is steady, you can scale them independently as separate agents. In one agent, you size for the worst case.

Different identities or trust boundaries. If part of the work needs to run as the user and part needs to run as a service principal, the separation is structural -- you can't do both in one agent identity. The fleet pattern reflects the trust topology you actually need.

The pattern across the cases that earn A2A: the multi-agent shape is reflecting something real in the problem. Different contexts, different tools, different failure boundaries, different scaling, different identities. When the shape of the problem matches the shape of the architecture, A2A is the right answer. When it doesn't, you're paying distributed-systems costs to solve a single-agent problem.

What this book is and isn't

This book is the gap between I built two agents that successfully exchange JSON and I run a fleet of agents in production that handle real load, real users, and real failure. It assumes:

You have shipped at least one agent to production. You know what an LLM call looks like; you've integrated tools; you've handled at least one production incident with an agent in it.
You have read the protocol docs you need. The A2A spec, MCP, your LLM provider's API reference -- this book references them but doesn't replicate them.
You can read TypeScript and Python without help. Code listings appear in both depending on what's idiomatic for the chapter's example.
You know what a distributed-systems fallacy looks like. If "the network is reliable" doesn't ring a bell, get a copy of Designing Data-Intensive Applications by Kleppmann first; this book treats the distributed-systems half of multi-agent systems as load-bearing prior knowledge.

This book is not:

A tutorial on building your first agent. Claude Code in Production (Vol II) is the operator-side book about running an agent; the Anthropic, OpenAI, and Google docs cover the API-side basics.
A protocol specification. The A2A spec, the MCP spec, and their alternatives are documents in their own right; this book references them where they constrain decisions.
A vendor comparison. The agent-orchestration tooling landscape moves quarterly; the principles in this book outlive the tools.
A book about prompt engineering for multi-agent systems. Prompt engineering matters; it isn't where multi-agent systems break in production. The distributed-systems disciplines are.

What you'll get from each part of the book

Part 1 (chapters 1-2) is foundations: this chapter, plus the protocol landscape (chapter 2) -- A2A spec, alternatives, why MCP isn't A2A, what each protocol commits you to.

Part 2 (chapters 3-4) is orchestration patterns: router and supervisor (chapter 3), swarm and blackboard, plus the cases where you shouldn't use A2A (chapter 4).

Part 3 (chapters 5-6) is trust between agents: auth across agent boundaries (chapter 5), write-permission and provenance and the telephone-game (chapter 6).

Part 4 (chapters 7-9) is state across agents: memory taxonomies in multi-agent systems (chapter 7), cross-agent and cross-tenant scoping (chapter 8), federated memory architectures (chapter 9). These three chapters are adaptations of material from Agent Memory in Production, the volume that briefly held the Vol IV slot before being decommissioned 2026-05-03; the multi-agent-shaped material seeded this part.

Part 5 (chapters 10-11) is operations: observability across agent boundaries (chapter 10), cost and latency and partial failure (chapter 11). The distributed-systems disciplines applied to fleets specifically.

Part 6 (chapter 12) is the future: the long-context-vs-delegation question, the autonomy spectrum, the bets I'd make starting a new fleet today.

A note on early access

This book ships as early access. v1.0.0-early.1 launched with the foundational chapter and the three adapted state chapters; v1.0.0-early.2 fills in the remaining eight, so all twelve chapters are drafted and readable end to end as of this revision. The early-access label remains in place through v1.0 -- the chapters are drafts, the companion repo (https://github.com/YawLabs/a2a-in-production-companion) is still being filled in tag by tag, and reader feedback is shaping revisions. Buyers get free updates through v1.0 and beyond -- the same companion repo, the same access, every revision as it lands. The early-access label is on the front matter, the README, the sales page, and the welcome email; the roadmap and the chapter -> companion-module map live in the OUTLINE.md.

If something in the book is wrong, unclear, or missing, file an issue against the companion repo -- early-access readers shape the priority of what gets drafted, sharpened, and exercised next more than the original outline does.

Forward map

Chapter 2 is the protocol landscape: the A2A spec, what it commits you to, the alternatives (LangGraph hand-offs, OpenAI Swarm-style routing, raw HTTP between agents), and why MCP -- which sits next to A2A in vocabulary and gets confused with it -- isn't the same problem.

If you only take one thing from this chapter, take this. Multi-agent systems are distributed systems first and prompt-engineering systems second. Every chapter in this book treats them that way, and the discipline that separates the fleets that ship from the fleets that fall over is the willingness to apply distributed-systems thinking to a problem the field is still talking about as if it were a prompting problem.

Read the rest of the book

Eleven more chapters - the protocol landscape, router and supervisor patterns, swarm and blackboard, auth across agent boundaries, write-permission and provenance, memory taxonomies, cross-agent scoping, federated memory architectures, observability across hops, cost and latency under partial failure, and the future. Early access. $39 -- PDF + EPUB + free updates + companion repo.

Get A2A in Production $39 →

Published by Yaw Labs.

Chapter 1: Why A2A is its own discipline