Skip to content
About

Multi-Agent & Production

A single agent can do a lot. Multi-agent systems divide work across several specialized agents — and shipping any agent, single or multi, surfaces a hard set of production problems. This page covers both.

The idea: instead of one agent juggling everything, use multiple agents, each with a focused role, a tailored prompt, and its own tools.

Supervisor Supervisor Agent A Agent B Agent C one coordinator routes subtasks to workers Pipeline Agent A Agent B Agent C each stage hands off to the next
  • Supervisor / orchestrator — a coordinator agent routes subtasks to worker agents and assembles their results. Flexible; the supervisor is the single point of control.
  • Pipeline — agents run in a fixed sequence, each handing off to the next (research → write → edit). Predictable and easy to debug.
  • Network — agents hand off to each other dynamically. Most flexible, least predictable — hardest to control.

It helps when subtasks are genuinely distinct (each agent gets a simpler, more focused context) or need different tools or permissions, and when parts can run in parallel.

Agents fail in ways single LLM calls don’t. Each must be engineered for.

An agent can loop — repeating a failing action, oscillating between two states, or just taking too long. Each iteration costs money. Mandatory guardrails:

  • Hard step limit — a max iteration count, always.
  • Budget cap — a per-task token/dollar ceiling that aborts the run.
  • Loop detection — spot repeated identical actions and break out.
  • Wall-clock timeout — bound total runtime.
limits = AgentLimits(max_steps=15, max_cost_usd=0.50, max_seconds=120)
# Hitting any limit ends the run with a graceful partial result — never silently.

A small error in step 2 becomes wrong input to step 3, and the agent confidently builds on it. Mitigate by validating intermediate results, having the agent verify before depending on a result, and using reflection checkpoints on critical steps.

Treat agency as risk surface. Keep humans in the loop for high-stakes actions. Make every run fully traceable — log every thought, tool call, argument, observation, and cost. When an agent misbehaves, the trace is the only way to find which step went wrong. This is non-negotiable for agents.

The tool security rules apply with extra force: agents chain many actions, and prompt injection is the signature threat — a malicious instruction hidden in a retrieved document or tool result can hijack the loop. Least-privilege tools, sandboxing, argument validation, and human approval gates are the defenses.

Agents are harder to evaluate than single calls — success is a trajectory, not one output. Measure both:

  • Outcome — did it achieve the goal? (task success rate)
  • Processhow? Step count, tool-choice accuracy, cost per task, latency, wrong turns. Two agents can both succeed while one costs 5× as much.

Build a suite of representative tasks with known good outcomes and run it on every change — same evaluation discipline as the rest of the guide, applied to trajectories.

Multi-agent systems split work across focused agents via supervisor, pipeline, or network patterns — but they multiply cost, latency, and failure points, so prefer one good agent until its context genuinely overloads. In production, agents demand hard limits (steps, budget, time), loop detection, validation of intermediate results, full tracing, least-privilege tools, and human approval for high-stakes actions. Evaluate both the outcome and the process.