← Writing

18 March 20268 min read

Multi-agent systems are easier than you think. They are rarely the right answer.

The technical implementation of multi-agent LLM systems is well-understood. The decision of when to use them is not.

Multi-agentLLM systemsArchitectureAI engineering

The confusion

There is a pattern in AI engineering circles where multi-agent systems are treated as the sophisticated solution and single-agent systems as the naive one. This is wrong, and it is costing people time, money, and working software.

Multi-agent systems are not inherently better than single-agent systems. They are a different tool for a different problem. Most problems do not require them.

This post is about when they do and when they do not.

What a multi-agent system actually is

A multi-agent system is a collection of LLM invocations that communicate with each other, share state, and collectively produce an output that no single invocation could produce alone.

The word "agent" implies autonomy and decision-making. In practice, most multi-agent systems are better described as pipelines: a sequence of LLM calls where the output of one becomes the input to the next, potentially with branching based on the output content.

The distinguishing characteristic of a true multi-agent system — as opposed to a pipeline — is that the agents make decisions about what to do next, not just what to say. The routing is LLM-driven, not hardcoded.

When multi-agent systems are the right answer

There are three scenarios where multi-agent architecture is genuinely justified:

1. Domain specialization at scale

When a task requires deep expertise in multiple distinct domains that cannot be held in a single context window, specialists are better than generalists. The compliance work in TraceLayer is a clear example: SOC 2, ISO 27001, and HIPAA use different control frameworks, different evidence standards, and different audit methodologies. A specialist agent for each framework produces better outputs than a general agent trying to handle all three.

The key phrase is "cannot be held in a single context window." If the domain knowledge fits comfortably in context, a single agent with comprehensive prompting is simpler and more reliable.

2. Parallelism with independent subtasks

If a task can be decomposed into truly independent subtasks, running them in parallel with separate agents reduces latency. Data collection from multiple APIs, analysis of multiple documents, generation of multiple independent artifacts — these are candidates for parallel agent execution.

The independence requirement is strict. Subtasks that look independent but share implicit dependencies will produce inconsistent outputs when parallelized.

3. Long-horizon tasks requiring persistent state

Some tasks are too long for a single context window even with the longest available context lengths. Multi-agent architectures with external state stores can support tasks that span hours or days of continuous execution.

This use case is less common than it appears. Most "long-horizon" tasks can be decomposed into a series of shorter tasks with human checkpoints. The automatic multi-agent approach trades predictability for continuity; the human-in-the-loop approach trades continuity for predictability. Usually predictability is worth more.

When multi-agent systems are the wrong answer

When you have a single task that fits in context

The most common mistake: building a multi-agent system because it feels more sophisticated, when a single well-prompted agent would produce equivalent or better output with lower latency and less complexity.

If you can describe the task in a single prompt and the output fits in a single response, use a single agent. Add complexity only when you have exhausted the single-agent approach and found it genuinely insufficient.

When the coordination overhead exceeds the benefit

Multi-agent systems have overhead: the state management, the routing logic, the error handling for individual agent failures, the monitoring and observability for a distributed system of LLM calls. This overhead is fixed cost that must be justified by the benefit.

For tasks that run occasionally and produce outputs of moderate value, the coordination overhead is often not justified.

When you need predictable behavior

LLM-driven routing introduces non-determinism. The same input does not always produce the same routing decision. For applications where behavior needs to be auditable and reproducible — financial systems, compliance systems, healthcare — hardcoded pipelines are often preferable to LLM-driven routing.

You can have multi-agent architecture with deterministic routing. This is often the best of both worlds: specialist agents, fixed routing logic.

"The routing decision is the most consequential part of a multi-agent system. Making it with an LLM adds non-determinism. Make sure the non-determinism is worth the flexibility."

The practical heuristic

Start with a single agent. When you identify a specific limitation — context length, domain breadth, latency from serialization — add the minimum architectural complexity needed to address that limitation.

Multi-agent systems built this way are designed around actual constraints. Multi-agent systems built from the start tend to be designed around imagined ones.

The systems I am most confident in are the ones where I can describe, precisely, why each agent exists and what would break if it were collapsed back into the calling agent. If I cannot give that answer, the agent should not exist.

← All writingWork with me →