Architecture

Multi-Agent Systems

What are multi-agent systems?

Multi-agent systems (MAS) consist of multiple AI agents working together, each with specialized roles and capabilities. Rather than a single agent handling everything, tasks are distributed among agents that collaborate to achieve outcomes no single agent could accomplish alone.

Think of it like a team versus an individual:

  • Single agent: One person doing everything
  • Multi-agent: A team with specialists (researcher, writer, reviewer, coordinator)
User Request: "Create a market analysis report"

Coordinator Agent
├── Research Agent → Gather data and statistics
├── Analysis Agent → Identify trends and insights
├── Writer Agent → Draft the report
└── Editor Agent → Review and refine

Each agent contributes its specialty, producing better results than any individual agent.

Why use multiple agents?

Specialization

Different tasks require different capabilities:

  • Research needs thoroughness and source evaluation
  • Writing needs clarity and structure
  • Coding needs technical precision
  • Review needs critical thinking

Specialized agents excel at their specific functions.

Complexity management

Complex tasks overwhelm single agents. Breaking work into specialized subtasks:

  • Reduces context requirements per agent
  • Enables focused, high-quality work
  • Makes debugging easier
  • Allows parallel execution

Diverse perspectives

Multiple agents can approach problems differently:

  • Optimistic vs skeptical analysis
  • Creative vs practical solutions
  • Fast vs thorough evaluation

Synthesizing diverse viewpoints produces better outcomes.

Scalability

Add agents to handle increased workload:

  • More researchers for larger topics
  • More writers for more content
  • More reviewers for higher quality
  • Parallel processing of independent tasks

Multi-agent architectures

Hierarchical (coordinator + specialists)

A coordinator agent orchestrates specialized workers:

Coordinator
├── Agent A (Research)
├── Agent B (Analysis)
├── Agent C (Writing)
└── Agent D (Review)

The coordinator:

  • Receives high-level requests
  • Breaks into subtasks
  • Assigns to appropriate specialists
  • Synthesizes results
  • Manages workflow

Peer-to-peer (collaborative)

Agents communicate directly without central coordination:

Agent A ←→ Agent B
   ↕          ↕
Agent C ←→ Agent D

Each agent:

  • Has defined responsibilities
  • Communicates with relevant peers
  • Makes autonomous decisions
  • Contributes to shared goals

Pipeline (sequential)

Agents process work in stages:

Input → Agent A → Agent B → Agent C → Output
        (Research)  (Draft)   (Edit)

Each agent:

  • Receives input from previous stage
  • Performs its specialized processing
  • Passes output to next stage

Debate/adversarial

Agents argue different positions:

Proposal → Advocate Agent
              ↕
         Critic Agent
              ↕
         Judge Agent → Decision

Used for:

  • Decision analysis
  • Risk assessment
  • Quality improvement
  • Bias reduction

Implementing multi-agent systems

Define agent roles

Each agent needs clear identity:

# Research Agent SOUL.md

You are a research specialist. Your role is to:
- Gather comprehensive information on assigned topics
- Evaluate source credibility
- Identify key facts and statistics
- Flag gaps or uncertainties

You are thorough, not fast. Quality over speed.
When uncertain, note the uncertainty rather than guessing.
# Critic Agent SOUL.md

You are a critical reviewer. Your role is to:
- Identify weaknesses, gaps, and errors
- Challenge assumptions
- Suggest improvements
- Ensure quality standards

You are skeptical but constructive. 
Find problems, but also suggest solutions.

Establish communication patterns

How do agents talk to each other?

# Message-based communication
coordinator.send(research_agent, {
    "task": "Research renewable energy trends",
    "deadline": "30 minutes",
    "output_format": "structured_notes"
})

# Shared workspace
workspace.write("research/findings.md", research_results)
analysis_agent.read("research/findings.md")

Coordinate workflow

Orchestrate the multi-agent process:

async def create_report(topic):
    # Parallel research
    research_tasks = [
        research_agent.research(f"{topic} - market size"),
        research_agent.research(f"{topic} - competitors"),
        research_agent.research(f"{topic} - trends"),
    ]
    research_results = await asyncio.gather(*research_tasks)
    
    # Sequential processing
    analysis = await analysis_agent.analyze(research_results)
    draft = await writer_agent.write(analysis)
    final = await editor_agent.review(draft)
    
    return final

Handle agent communication

# Inter-agent message format
message:
  from: coordinator
  to: research_agent
  type: task_assignment
  content:
    task: "Research competitor pricing"
    context: "For Q4 market analysis"
    constraints:
      max_time: 10m
      sources: ["official sites", "press releases"]
    output_format: structured_data

Common multi-agent patterns

The research team

Coordinator
├── Web Researcher (current info)
├── Academic Researcher (deep knowledge)
├── Data Analyst (statistics)
└── Synthesizer (combine findings)

The content creation team

Coordinator
├── Researcher (gather material)
├── Outliner (structure content)
├── Writer (create draft)
├── Editor (improve quality)
└── Fact-checker (verify claims)

The code development team

Coordinator
├── Architect (design decisions)
├── Implementer (write code)
├── Tester (create tests)
├── Reviewer (code review)
└── Documenter (write docs)

The decision support team

Coordinator
├── Analyst (gather options)
├── Advocate (argue for option A)
├── Advocate (argue for option B)
├── Risk Assessor (evaluate downsides)
└── Synthesizer (recommend decision)

Challenges in multi-agent systems

Coordination overhead

More agents = more communication = more complexity.

Mitigation:

  • Clear protocols and interfaces
  • Minimal necessary communication
  • Well-defined handoff points
  • Strong coordinator agent

Inconsistency

Different agents may contradict each other.

Mitigation:

  • Shared context and facts
  • Explicit conflict resolution
  • Final synthesis step
  • Version control for shared artifacts

Error propagation

One agent's mistake affects downstream agents.

Mitigation:

  • Validation between stages
  • Quality checkpoints
  • Ability to backtrack
  • Error isolation

Resource costs

Multiple agents = multiple API calls = higher costs.

Mitigation:

  • Right-size models for each role
  • Cache common operations
  • Batch related requests
  • Avoid unnecessary agents

Debugging difficulty

Hard to trace issues through multiple agents.

Mitigation:

  • Comprehensive logging
  • Clear message trails
  • Step-by-step tracing
  • Ability to replay scenarios

Multi-agent frameworks

CrewAI Role-based agent collaboration framework:

  • Define agents with roles and goals
  • Specify tasks and dependencies
  • Automatic coordination

AutoGen Microsoft's framework for multi-agent conversation:

  • Flexible agent communication
  • Human-in-the-loop support
  • Code execution capabilities

LangGraph Graph-based agent orchestration:

  • Define agents as nodes
  • Specify communication as edges
  • Complex workflow support

Custom implementations Many production systems build custom:

  • Tailored to specific needs
  • Full control over communication
  • Optimized for particular workloads

Best practices

Start with single agent, add complexity as needed

Don't assume you need multiple agents. Start simple, identify bottlenecks, then specialize.

Clear responsibility boundaries

Each agent should have unambiguous scope. Overlap creates conflict and wasted work.

Minimize communication

Every message has cost (latency, tokens, complexity). Communicate what's necessary, not everything.

Test agents independently and together

Unit test each agent, then integration test the system. Different failure modes emerge at each level.

Human oversight at key points

Complex systems need human checkpoints. Don't let agents run fully autonomous on high-stakes tasks.

Monitor and log everything

When something goes wrong (it will), you need visibility into what happened.