A practitioner's guide to agent architecture. State machines, tool orchestration, memory systems, and human-in-the-loop escalation — from a real production agent.
Every autonomous agent follows the same fundamental loop. The differences are in what happens inside each phase and how the agent handles failure.
This is an OODA loop adapted for AI agents. The key insight is that the filesystem is memory — state persists between invocations by writing to disk. The agent does not rely on in-context memory alone.
Define discrete phases the agent moves through, with clear entry/exit criteria. This prevents the agent from thrashing between tasks.
## Phase Transitions
RECON → RESEARCH → STRATEGY → BUILD → DEPLOY → MEASURE → EVOLVE
↑ |
└──────────────────────────────────────────────────────────┘
Entry criteria for each phase:
- RECON: Agent starts here. Scan environment, discover capabilities.
- RESEARCH: RECON complete. Market analysis, opportunity identification.
- STRATEGY: Research complete. Rank opportunities, allocate resources.
- BUILD: Strategy chosen. Execute highest-priority task.
- DEPLOY: Build complete. Ship to production.
- MEASURE: Deployed. Check metrics, revenue, user signals.
- EVOLVE: Measurement complete. Update strategy, re-rank, self-modify.
Use the filesystem as the agent's persistent memory. Each file serves a specific memory function. The agent reads its own state files at the start of each invocation.
state/
├── log.md # Append-only event log (what happened)
├── strategy.md # Current strategy and ranked priorities
├── metrics.md # Quantitative measurements
├── roadmap.md # Ordered task queue
├── blockers.md # What's preventing progress
└── self_assessment.md # Agent's evaluation of its own performance
Define a tool hierarchy where the agent tries the best tool first, falls back to alternatives, and logs tool failures for future optimization.
Tool Selection Algorithm:
1. Identify the task type (search, create, deploy, measure)
2. Select primary tool for task type
3. If primary fails → try fallback tool
4. If all tools fail → log failure, add to blockers, escalate if critical
5. Record tool success/failure rates in metrics
Example tool chain for "deploy website":
Primary: Vercel CLI (vercel deploy)
Fallback: GitHub Pages via git push
Fallback: Netlify CLI (netlify deploy)
Escalate: Write to human_actions_needed.md
Define a queue for actions that require human intervention. Each item specifies what, why, time estimate, and priority. The agent continues with other tasks while waiting.
## HITL Queue Protocol
Each item in the queue must specify:
- WHAT: Exact steps (copy-pasteable commands)
- WHY: What it unblocks
- TIME: Estimated minutes
- PRIORITY: BLOCKING (cannot proceed) or NICE-TO-HAVE
Rules:
- HITL minutes per week MUST converge to zero over time
- Agent must be able to make progress on OTHER tasks while blocked
- Never queue something the agent can do itself
- Track cumulative HITL minutes at the top of the file
The agent modifies its own instructions based on what it learns. Every N iterations, it re-reads all state, evaluates what worked, and updates its strategy document.
Self-Evolution Protocol (every ~5 iterations):
1. Read all state files
2. Compute metrics delta since last evolution
3. Identify: what worked? what failed? what's stale?
4. Update strategy.md with new rankings
5. Update CLAUDE.md (the agent's own governance file) if needed
6. Prune completed items from roadmap
7. Log the evolution event
Don't put all compute budget into one strategy. Maintain 2-3 concurrent workstreams at different risk levels.
Portfolio Structure:
- ANCHOR: High-probability, low-reward. The floor.
- GROWTH: Medium-probability, medium-reward. Compounds over time.
- MOONSHOT: Low-probability, high-reward. Worth a small allocation.
Allocation Rule:
60% of compute → ANCHOR (survival)
30% of compute → GROWTH (compounding)
10% of compute → MOONSHOT (optionality)
| Approach | Complexity | Autonomy | Best For |
|---|---|---|---|
| Single-shot prompt | Low | None | One-off tasks, code generation |
| ReAct loop | Medium | Single session | Research, debugging, multi-step tasks |
| State machine agent | High | Multi-session | Ongoing projects, autonomous operations |
| Multi-agent swarm | Very high | Multi-session | Complex systems, parallel workstreams |
Describe your agent's goal and get a starter architecture document.
The Autonomous Agent Architect prompt generates complete agent systems with state machines, tool chains, memory schemas, error recovery, and self-evolution — based on a real production autonomous agent.