Every AI agent is built on an architecture pattern — even if the builder doesn't realize it. The pattern determines how the agent reasons, when it uses tools, how it handles errors, and ultimately whether it works reliably or falls apart under real traffic.
There's no single "best" architecture. A customer support agent needs a different design than a research agent or a coding assistant. The right choice depends on your task complexity, latency requirements, cost budget, and reliability needs.
This guide covers the 6 architecture patterns used by production AI agents in 2026, with trade-offs and code for each.
Pattern 1: ReAct (Reasoning + Acting)
The most common agent pattern. The LLM alternates between thinking (reasoning about what to do) and acting (calling tools). Each observation from a tool informs the next thought.
Loop:
1. THOUGHT: "I need to find the user's order status"
2. ACTION: lookup_order(order_id="12345")
3. OBSERVATION: {"status": "shipped", "tracking": "FX789"}
4. THOUGHT: "I have the tracking info, I can respond now"
5. RESPONSE: "Your order shipped! Tracking: FX789"
Implementation
class ReActAgent:
def __init__(self, llm, tools, max_steps=10):
self.llm = llm
self.tools = {t.name: t for t in tools}
self.max_steps = max_steps
def run(self, user_input: str) -> str:
messages = [
{"role": "system", "content": self.system_prompt},
{"role": "user", "content": user_input}
]
for step in range(self.max_steps):
response = self.llm.generate(messages, tools=list(self.tools.values()))
if response.tool_calls:
for call in response.tool_calls:
result = self.tools[call.name].execute(**call.args)
messages.append({"role": "tool", "content": str(result),
"tool_call_id": call.id})
else:
return response.content # Final answer
return "I wasn't able to complete this request. Let me connect you with support."
When to Use ReAct
- Tasks with 1-5 tool calls
- When the next step depends on the previous result
- Customer support, Q&A, simple data lookups
Trade-offs
| Pros | Cons |
|---|---|
| Simple to implement | Sequential — can't parallelize tool calls |
| Good reasoning transparency | Cost grows linearly with steps (full context each time) |
| Works with any LLM that supports tools | Can loop on difficult tasks |
| Easy to debug (read the thought chain) | No upfront planning — greedy decisions |
Pattern 2: Plan-and-Execute
Instead of deciding one step at a time, the agent first creates a complete plan, then executes each step. If a step fails, it replans.
1. PLAN:
Step 1: Look up customer's recent orders
Step 2: Check refund eligibility for each
Step 3: Process refund for the eligible one
Step 4: Send confirmation email
2. EXECUTE Step 1: lookup_orders(email="[email protected]")
3. EXECUTE Step 2: check_eligibility(order_id="ORD-789")
4. EXECUTE Step 3: process_refund(order_id="ORD-789", amount=49.99)
5. EXECUTE Step 4: send_email(to="[email protected]", template="refund_confirmation")
Implementation
class PlanAndExecuteAgent:
def __init__(self, planner_llm, executor_llm, tools):
self.planner = planner_llm # Strong model (GPT-4o, Claude Sonnet)
self.executor = executor_llm # Can be cheaper model
self.tools = tools
def run(self, user_input: str) -> str:
# Phase 1: Plan
plan = self.create_plan(user_input)
# Phase 2: Execute
results = []
for i, step in enumerate(plan.steps):
try:
result = self.execute_step(step, results)
results.append({"step": step, "result": result, "status": "success"})
except Exception as e:
results.append({"step": step, "result": str(e), "status": "failed"})
# Replan from current state
plan = self.replan(user_input, results, remaining=plan.steps[i+1:])
# Phase 3: Synthesize response
return self.synthesize(user_input, results)
def create_plan(self, task: str) -> Plan:
prompt = f"""Create a step-by-step plan to accomplish this task.
Each step should map to exactly one tool call.
Available tools: {[t.name for t in self.tools]}
Task: {task}
Output as JSON: {{"steps": ["step 1 description", "step 2 description", ...]}}"""
return self.planner.generate(prompt)
def replan(self, task, completed, remaining):
prompt = f"""The original plan hit a problem. Create a new plan.
Task: {task}
Completed steps: {completed}
Remaining (may need changes): {remaining}"""
return self.planner.generate(prompt)
When to Use Plan-and-Execute
- Complex tasks with 5+ steps
- Tasks where order matters (can't just wing it)
- Research tasks, data pipelines, multi-tool workflows
Trade-offs
| Pros | Cons |
|---|---|
| Better at complex multi-step tasks | Planning step adds latency |
| Can use cheaper model for execution | Plan can be wrong (garbage in, garbage out) |
| Replanning handles failures gracefully | More complex to implement |
| User can review/approve plan before execution | Over-plans for simple tasks |
Pattern 3: Router Agent
A lightweight agent that classifies the request and routes it to a specialized handler. Each handler is optimized for one type of task.
User Input → Router
│
├── "order_status" → Order Status Handler (fast, cheap model)
├── "refund" → Refund Handler (careful model + approval flow)
├── "technical" → Technical Support Handler (RAG + strong model)
└── "general" → General Handler (basic RAG)
Implementation
class RouterAgent:
def __init__(self):
self.router_llm = "gpt-4o-mini" # Fast classification
self.handlers = {
"order_status": OrderStatusHandler(model="gpt-4o-mini"),
"refund": RefundHandler(model="gpt-4o", requires_approval=True),
"technical": TechnicalHandler(model="claude-sonnet", rag=True),
"billing": BillingHandler(model="gpt-4o"),
"general": GeneralHandler(model="gpt-4o-mini", rag=True),
}
async def run(self, user_input: str) -> str:
# Step 1: Classify (fast, ~100ms)
intent = await self.classify(user_input)
# Step 2: Route to handler
handler = self.handlers.get(intent["category"], self.handlers["general"])
# Step 3: Execute specialized handler
return await handler.handle(user_input, intent)
async def classify(self, text: str) -> dict:
result = await self.router_llm.generate(
f"Classify into: {list(self.handlers.keys())}\nInput: {text}\nJSON: {{\"category\": \"...\", \"confidence\": 0.0-1.0}}",
model=self.router_llm
)
return json.loads(result)
When to Use Router
- Multiple distinct task types with different requirements
- When you want different models/tools per task type
- Customer support, multi-purpose assistants
Trade-offs
| Pros | Cons |
|---|---|
| Optimized cost per task type | Classification errors route to wrong handler |
| Each handler is simple and focused | More code to maintain |
| Easy to add new task types | Doesn't handle multi-intent requests well |
| Latency optimized per path | Router adds one extra LLM call |
Pattern 4: Hierarchical (Manager/Worker)
A manager agent breaks the task into subtasks and delegates each to specialized worker agents. Workers run independently and report back.
User: "Write a market analysis report for AI agents in healthcare"
Manager Agent:
├── Worker 1: Research market size and growth (web search agent)
├── Worker 2: Find key players and competitors (web search agent)
├── Worker 3: Analyze regulatory landscape (RAG agent)
└── Worker 4: Compile report from all findings (writing agent)
Each worker has its own tools, context, and model.
Implementation
class HierarchicalAgent:
def __init__(self):
self.manager = ManagerLLM(model="gpt-4o")
self.workers = {
"researcher": ResearchWorker(tools=[web_search, scrape]),
"analyst": AnalystWorker(tools=[database_query, calculator]),
"writer": WriterWorker(tools=[format_document]),
}
async def run(self, task: str) -> str:
# Manager creates subtask plan
subtasks = await self.manager.decompose(task)
# Execute workers (parallel where possible)
results = {}
parallel_groups = self.manager.identify_parallel_groups(subtasks)
for group in parallel_groups:
group_results = await asyncio.gather(*[
self.workers[st.worker_type].execute(st)
for st in group
])
for st, result in zip(group, group_results):
results[st.id] = result
# Manager synthesizes final output
return await self.manager.synthesize(task, results)
When to Use Hierarchical
- Complex tasks that decompose into independent subtasks
- When subtasks need different tools/models
- Research, report generation, complex analysis
Trade-offs
| Pros | Cons |
|---|---|
| Parallel execution = faster | Most complex to implement |
| Each worker is focused and testable | Manager decomposition can be wrong |
| Scales to very complex tasks | Higher cost (multiple agents running) |
| Workers can use different models | Inter-worker communication is tricky |
Pattern 5: Reflection
The agent generates a response, then critiques its own output and iterates. Like having a built-in code reviewer.
1. GENERATE: Draft response to user query
2. REFLECT: "Is this response correct? Complete? Well-formatted?"
3. CRITIQUE: "The pricing info might be outdated. I should verify."
4. REVISE: Call pricing API, update response
5. REFLECT: "Now it's accurate and complete."
6. RESPOND: Return final version
Implementation
class ReflectionAgent:
def __init__(self, llm, tools, max_reflections=3):
self.llm = llm
self.tools = tools
self.max_reflections = max_reflections
def run(self, user_input: str) -> str:
# Generate initial response
draft = self.llm.generate(f"Respond to: {user_input}")
for i in range(self.max_reflections):
# Critique
critique = self.llm.generate(f"""Review this response for issues:
User query: {user_input}
Draft response: {draft}
Check for:
1. Factual accuracy — are all claims verifiable?
2. Completeness — does it address the full query?
3. Missing information — should any tools be called?
4. Tone — is it appropriate?
If the response is good, output: {{"status": "approved"}}
If it needs improvement, output: {{"status": "revise", "issues": ["..."], "suggested_actions": ["..."]}}""")
result = json.loads(critique)
if result["status"] == "approved":
return draft
# Revise based on critique
if result.get("suggested_actions"):
for action in result["suggested_actions"]:
tool_result = self.execute_action(action)
draft = self.llm.generate(
f"Revise this response based on new information:\n"
f"Original: {draft}\n"
f"New data: {tool_result}\n"
f"Issues to fix: {result['issues']}"
)
return draft # Return best version after max reflections
When to Use Reflection
- Tasks where accuracy is critical (medical, legal, financial)
- Content generation (writing, reports, emails)
- When the cost of a wrong answer is high
Trade-offs
| Pros | Cons |
|---|---|
| Higher accuracy through self-correction | 2-3x the cost (multiple LLM calls) |
| Catches hallucinations before delivery | Slower (each reflection adds latency) |
| Natural quality improvement loop | Can over-critique and make things worse |
| Works with any base architecture | Diminishing returns after 2-3 iterations |
Pattern 6: State Machine
The most structured pattern. The agent follows a predefined state machine with explicit transitions. Each state has its own behavior, tools, and exit conditions.
States: [GREETING] → [IDENTIFY] → [DIAGNOSE] → [RESOLVE] → [CONFIRM] → [CLOSE]
GREETING: Welcome user, detect intent
→ IDENTIFY (if needs account lookup)
→ DIAGNOSE (if general question)
IDENTIFY: Authenticate user, find account
→ DIAGNOSE (authenticated)
→ ESCALATE (auth failed 3x)
DIAGNOSE: Understand the specific issue
→ RESOLVE (issue identified)
→ ESCALATE (can't determine issue)
RESOLVE: Apply fix or provide answer
→ CONFIRM (fix applied)
→ ESCALATE (can't resolve)
CONFIRM: Verify customer is satisfied
→ CLOSE (satisfied)
→ DIAGNOSE (not satisfied, try again)
Implementation
from enum import Enum
class State(Enum):
GREETING = "greeting"
IDENTIFY = "identify"
DIAGNOSE = "diagnose"
RESOLVE = "resolve"
CONFIRM = "confirm"
CLOSE = "close"
ESCALATE = "escalate"
class StateMachineAgent:
def __init__(self):
self.state = State.GREETING
self.context = {}
self.handlers = {
State.GREETING: self.handle_greeting,
State.IDENTIFY: self.handle_identify,
State.DIAGNOSE: self.handle_diagnose,
State.RESOLVE: self.handle_resolve,
State.CONFIRM: self.handle_confirm,
}
async def process_message(self, message: str) -> str:
handler = self.handlers.get(self.state)
if not handler:
return "This conversation has ended. Please start a new one."
response, next_state = await handler(message)
self.state = next_state
return response
async def handle_greeting(self, message):
intent = await classify_intent(message)
self.context["intent"] = intent
if intent["requires_auth"]:
return ("I'd be happy to help with that! First, I need to verify your identity. "
"Could you provide your email address?"), State.IDENTIFY
else:
return await self.handle_diagnose(message)
async def handle_identify(self, message):
# Try to authenticate
email = extract_email(message)
if email:
account = await lookup_account(email)
if account:
self.context["account"] = account
return (f"Found your account. Now, tell me more about the issue "
f"you're experiencing."), State.DIAGNOSE
self.context["auth_attempts"] = self.context.get("auth_attempts", 0) + 1
if self.context["auth_attempts"] >= 3:
return "Let me connect you with our team for security verification.", State.ESCALATE
return "I couldn't find an account with that info. Could you try again?", State.IDENTIFY
async def handle_diagnose(self, message):
# Use RAG + LLM to understand the issue
context_docs = await retrieve_relevant_docs(message)
diagnosis = await self.llm.generate(
f"Diagnose this issue: {message}\nContext: {context_docs}"
)
self.context["diagnosis"] = diagnosis
return diagnosis["suggested_response"], State.RESOLVE
When to Use State Machine
- Well-defined workflows (support, onboarding, intake)
- Compliance-sensitive processes (need audit trail)
- When you need predictable behavior
Trade-offs
| Pros | Cons |
|---|---|
| Most predictable and controllable | Rigid — can't handle unexpected flows |
| Easy to audit and debug | Requires upfront workflow design |
| Clear metrics per state | New scenarios need new states |
| Lowest risk of runaway behavior | Less "intelligent" feeling to users |
Choosing the Right Pattern
| Scenario | Best Pattern | Why |
|---|---|---|
| Simple Q&A with tools | ReAct | Low complexity, good enough |
| Complex multi-step research | Plan-and-Execute | Needs upfront planning |
| Multi-purpose assistant | Router | Different handlers per intent |
| Report generation | Hierarchical | Parallel research + synthesis |
| High-accuracy responses | Reflection | Self-correction catches errors |
| Regulated workflow | State Machine | Predictable, auditable |
| Customer support | Router + State Machine | Route by intent, structured flow per type |
| Coding assistant | ReAct + Reflection | Try code, test, self-correct |
Anti-Pattern: The "God Agent"
The most common architecture mistake: one giant agent with 30 tools, a 5,000-token system prompt, and instructions for every possible scenario. This agent:
- Confuses which tools to use (too many choices)
- Has slow, expensive LLM calls (massive context)
- Is impossible to test (too many code paths)
- Degrades as you add more features
If your agent has more than 8-10 tools, you need a Router or Hierarchical pattern. Split it up.
Architecture Decision Checklist
Before building, answer these questions:
- How many steps does the typical task require? (1-3: ReAct, 4-8: Plan-and-Execute, 8+: Hierarchical)
- Are there distinct task categories? (Yes: Router)
- Is accuracy critical? (Yes: add Reflection)
- Is the workflow well-defined? (Yes: State Machine)
- What's your latency budget? (Tight: ReAct or Router. Flexible: any)
- What's your cost budget per request? (Tight: ReAct + cheap model. Flexible: Hierarchical + Reflection)
Designing AI agent architectures? AI Agents Weekly covers patterns, frameworks, and production case studies 3x/week. Join free.
Conclusion
Architecture is the decision that's hardest to change later. Start with the simplest pattern that meets your requirements (usually ReAct), then evolve. Most production agents end up as hybrids — and that's fine.
The key insight: match the architecture to the task, not the framework. Don't use a hierarchical multi-agent system because it sounds impressive. Use it because your task genuinely decomposes into parallel subtasks. The best architecture is the one that solves your problem with the least complexity.