Architecture

Agent-to-Agent Communication

What is agent-to-agent communication?

Agent-to-agent communication is how AI agents exchange information, coordinate tasks, and collaborate without human intermediation. Just as humans communicate to work together, agents need protocols and patterns for effective collaboration.

When one agent needs another's help—or when work must be handed off, delegated, or coordinated—agent-to-agent communication makes it possible:

Coordinator Agent: "Research Agent, investigate market trends for Q4"
Research Agent: "Found 5 key trends. Passing to Analysis Agent."
Analysis Agent: "Analysis complete. Key insight: 23% growth expected."
Coordinator Agent: "Writer Agent, draft report based on this analysis."

This enables complex workflows where multiple specialized agents contribute to outcomes no single agent could achieve alone.

Why agents need to communicate

Division of labor

Complex tasks benefit from specialization:

  • Research agent gathers information
  • Analysis agent interprets data
  • Writing agent creates content
  • Review agent ensures quality

Communication enables handoffs between specialists.

Parallel processing

Independent subtasks can run simultaneously:

Coordinator → Research Agent A (market size)
           → Research Agent B (competitors)
           → Research Agent C (trends)
           
All three work in parallel, report back when done.

Communication coordinates parallel work and aggregates results.

Error handling and recovery

When an agent fails, others need to know:

Agent A: "Task failed due to API timeout"
Coordinator: "Retry with extended timeout"
          or "Reassign to Agent B"
          or "Proceed without this data"

Communication enables graceful degradation and recovery.

Context sharing

Agents may need information from each other:

Writer Agent: "What format does the client prefer?"
Coordinator: "Formal tone, bullet points, executive summary first"

Communication transfers context that enables better work.

Communication patterns

Request-response

One agent asks, another answers:

Agent A → Request → Agent B
Agent A ← Response ← Agent B

Synchronous and simple. Agent A waits for Agent B's response.

Publish-subscribe

Agents subscribe to topics, receive relevant messages:

Research Agent publishes: "new_finding" → Topic: research-updates
Analysis Agent subscribed to: research-updates → Receives finding
Writer Agent subscribed to: research-updates → Receives finding

Decoupled communication. Publishers don't know subscribers.

Message queue

Agents place work in queues, others process:

Coordinator → [Task Queue] → Worker Agents
                            Agent 1 takes task
                            Agent 2 takes task
                            Agent 3 takes task

Enables load balancing and asynchronous processing.

Shared workspace

Agents read/write to common artifacts:

Research Agent writes: workspace/research-notes.md
Analysis Agent reads: workspace/research-notes.md
Analysis Agent writes: workspace/analysis.md
Writer Agent reads: workspace/*.md

Loose coupling through shared state.

Direct messaging

Point-to-point communication between specific agents:

Coordinator → Research Agent: "Start task X"
Research Agent → Coordinator: "Task X complete"

Clear addressing but requires knowing recipient.

Message formats

Structured messages

{
  "from": "coordinator",
  "to": "research-agent",
  "type": "task_assignment",
  "timestamp": "2024-01-15T10:30:00Z",
  "content": {
    "task_id": "task-123",
    "description": "Research renewable energy market size",
    "deadline": "2024-01-15T11:00:00Z",
    "priority": "high"
  },
  "reply_to": "coordinator-inbox"
}

Machine-readable, explicit, versionable.

Natural language messages

From: Coordinator
To: Research Agent

Please research the current market size for renewable energy, 
focusing on solar and wind. I need this within 30 minutes 
for the client presentation. Include sources.

Human-readable, flexible, but may be ambiguous.

Hybrid approach

{
  "metadata": {
    "from": "coordinator",
    "to": "research-agent",
    "task_id": "task-123"
  },
  "content": "Please research the current market size for renewable energy, focusing on solar and wind. Include sources.",
  "constraints": {
    "deadline": "30m",
    "required_fields": ["market_size", "sources"]
  }
}

Structured metadata with natural language content.

Implementing agent communication

Message bus approach

Central message routing:

class MessageBus:
    def __init__(self):
        self.subscribers = {}
    
    def subscribe(self, agent_id, topic):
        if topic not in self.subscribers:
            self.subscribers[topic] = []
        self.subscribers[topic].append(agent_id)
    
    def publish(self, topic, message):
        for agent_id in self.subscribers.get(topic, []):
            deliver(agent_id, message)
    
    def send(self, from_agent, to_agent, message):
        deliver(to_agent, {
            "from": from_agent,
            **message
        })

Queue-based approach

Asynchronous task processing:

# Coordinator enqueues work
task_queue.put({
    "task": "research",
    "topic": "market analysis",
    "callback": "coordinator-session"
})

# Worker agent processes
while True:
    task = task_queue.get()
    result = process_task(task)
    callback_queue.put({
        "task_id": task["id"],
        "result": result
    })

Session-based approach

Agents communicate via session messages:

# In production systems like Clawdbot/OpenClaw
coordinator_session.spawn_subagent(
    task="Research renewable energy trends",
    callback=lambda result: handle_research_result(result)
)

# Or enqueue to different session
message_bus.enqueue(
    session="research-agent-session",
    message="Start research on topic X"
)

Coordination protocols

Handoff protocol

Transferring responsibility between agents:

1. Agent A: "Handing off task-123 to Agent B"
2. Agent A: Sends task context and state
3. Agent B: "Accepting task-123"
4. Agent B: Acknowledges receipt
5. Agent A: "Handoff complete, releasing task-123"

Status reporting protocol

Keeping coordinators informed:

Every N minutes or on state change:
  Agent → Coordinator: {
    "status": "working|blocked|complete|failed",
    "progress": 0.6,
    "current_step": "Analyzing data",
    "estimated_completion": "5 minutes",
    "issues": []
  }

Escalation protocol

When agents need help:

1. Agent attempts task
2. If stuck: Agent → Coordinator: "Need help with X"
3. Coordinator evaluates:
   - Provide guidance
   - Assign different agent
   - Escalate to human
4. Resolution delivered to agent

Challenges and solutions

Message ordering

Messages may arrive out of order.

Solution: Timestamps, sequence numbers, idempotent handlers.

Deadlocks

Agent A waits for Agent B which waits for Agent A.

Solution: Timeouts, deadlock detection, async patterns.

Message loss

Messages may fail to deliver.

Solution: Acknowledgments, retries, persistent queues.

Context drift

Agents develop inconsistent understanding.

Solution: Shared state, periodic synchronization, explicit context transfer.

Overload

Too many messages overwhelm agents.

Solution: Rate limiting, prioritization, backpressure.

Best practices

Design clear contracts

Define what messages each agent sends and expects:

research_agent:
  accepts:
    - task_assignment
    - context_update
  produces:
    - task_complete
    - task_failed
    - status_update

Use async when possible

Don't block agents waiting for responses. Let them work on other things while waiting.

Log all communication

Every message should be traceable for debugging:

[10:30:00] coordinator → research-agent: task_assignment (task-123)
[10:30:01] research-agent → coordinator: task_accepted (task-123)
[10:35:00] research-agent → coordinator: task_complete (task-123)

Handle failures explicitly

Every communication path needs error handling:

  • Timeout handling
  • Retry logic
  • Fallback behavior
  • Failure notification

Version your message formats

As systems evolve, message formats change. Include version info:

{
  "version": "1.2",
  "type": "task_assignment",
  ...
}