Multi-Agent Systems: A Practical Guide (2026)

The first agent you ship is software. The second is infrastructure. The moment you add a second agent that talks to the first, you've built a distributed system -- and every fallacy of distributed computing applies, plus a few new ones the AI layer adds on top.

This guide covers when a fleet is the right answer (it usually isn't), the four orchestration patterns when it is, the distributed-systems fallacies that apply doubly to agents, and the failure modes that show up only after launch. The full version is the early-access A2A in Production book.

When NOT to use multi-agent

Most "we should add another agent" instincts are wrong. The right shape for most problems is one bigger agent with more tools -- not because multi-agent is bad, but because:

Context is fragmented. Each agent has its own context window. Information that the host saw doesn't propagate unless you build the propagation.
Cost amplifies. One user request becomes N agent calls becomes M model calls. The token bill grows multiplicatively, not additively.
Debugging gets harder. A trace that spans three agents tells you nothing useful without explicit propagation work.
Auth gets harder. Identity, scope, and provenance across hops are real engineering problems with no good defaults.

The honest test: can a single agent with the union of all the tools you'd give the fleet handle this? If yes, don't build a fleet.

When multi-agent is the right answer

The patterns where multi-agent wins:

Specialists need different tool sets that conflict. One agent has read-only DB access; another has write access. Different specialists, different auth scopes.
Specialists need different memories. A customer-support agent should not see the engineer's internal docs; an internal agent should not have customer-facing memory.
Parallel independent work. Three things need to happen simultaneously and the model's serial execution is the bottleneck.
Cross-organizational composition. Your agent talking to a customer's agent is multi-agent by definition.

The four orchestration patterns

Router

One agent receives the request and dispatches to a specialist. The router is the load-bearing piece -- if it routes wrong, the specialist does the wrong work confidently. The pattern earns its keep when specialists are genuinely different (different tools, different auth, different memory).

Supervisor

One agent owns the task end-to-end and decomposes it into subtasks for specialists. The supervisor sees the whole arc; the specialists each see their slice. The supervisor integrates results. Right for tasks that need plan-then-execute.

Swarm

Multiple agents work in parallel; an aggregator combines results. No central orchestrator; each specialist is independent. Right for embarrassingly parallel work where one agent's result doesn't depend on another's.

Blackboard

Agents read from and write to a shared workspace; the next agent picks up where the previous left off. No central orchestrator; coordination is via shared state. Right for collaborative workflows where the shape of the work emerges from the interaction.

The distributed-systems fallacies that apply doubly

The network is reliable. No -- and now the agent on the other end is also non-deterministic. Retries are not idempotent because the second call might produce different output.
Latency is zero. Agent calls add wall-clock time. A workflow with five hops is five model latencies stacked.
Bandwidth is infinite. Each agent's input is bounded by its context window. Passing the whole conversation to a specialist often exceeds it.
The topology doesn't change. An agent can be retrained, swapped, or have its tools changed without your knowledge if you don't own it.
There's one administrator. Cross-organizational multi-agent breaks this hard. Auth, scope, and policy diverge across the boundary.
Transport cost is zero. Token cost is the transport cost; it is not zero.

Failure modes you do not see until production

The telephone game. Agent C reads B's summary of A's output and acts on a distorted version of the original.
Memory bleeding across tenants. Agent A wrote it for user X; agent B retrieves it while serving user Y. The schema didn't encode the scope.
Cascading timeout meltdown. One specialist hangs; budgets don't propagate; the fleet falls over instead of degrading.
The "specialists" that are one agent in three coats. Heavily overlapping capabilities; you're paying distributed cost to solve a single-agent problem.

Want the full treatment?

A2A in Production (early access) is twelve chapters on the discipline of multi-agent systems as distributed systems first and prompt-engineering second. The book covers the orchestration patterns in depth, the auth question, federated memory, observability across boundaries, partial failure under load, and when not to use A2A at all.

A2A in Production

The book on multi-agent systems and agent orchestration. Twelve chapters drafted; readable end to end as of v1.0.0-early.2. PDF + EPUB. Free updates through v1.0 and beyond. Free with a Token Limit News signup (early access).