AI Agents in Practice

Building Real Products with Autonomous Agent Teams
MIT GenAI Global · March 23, 2026

The Agent Spectrum

LevelWhat It DoesWhere Most Orgs Are
ChatbotAnswers questions from a script✓ Here
CopilotSuggests, drafts, assists a human✓ Getting here
Autonomous AgentCompletes tasks independently⚡ Experimenting
Agent TeamsMultiple agents coordinate on complex work✗ Almost nobody

The Personal AI Trajectory

💬
Yesterday
Chat in a browser tab.
You go to it.
Stateless. Forgets everything.
📱
Now
Always-on in your pocket.
Persistent memory.
Knows your context across channels.
🧠
Next
Anticipates before you ask.
Manages routines autonomously.
Acts on your behalf.
Proactive, not reactive.
🌐
Endgame
Manages your entire digital life.
Star Trek: amplifies you.
Wall-E: replaces you.
Same technology. Different design choices.
Now that AI can think — do we let it do all our thinking, or use it like they do at Starfleet Academy to amplify our learning and impact?

Let's Build

☄️

NASA NEO Dashboard

Live asteroid data from NASA API. Interactive charts. Deterministic output.

👋

Onboarding System

Portal + live Mattermost channel. New hire Q&A agent.

🔥

Incident Postmortem

Structured report + incident channel. Ask about the outage.

14
Agents
3
Products
0
Lines Written by Humans
The AI is the engineer,
not the engine.
Build deterministic systems with nondeterministic tools.

Deterministic vs Nondeterministic

AI Builds the System

  • Code calls real APIs
  • Same input, same output. Every time.
  • You can audit it. You can test it.
  • Example: NASA asteroid dashboard

AI Runs the System

  • LLM answers questions at runtime
  • Different answer each time
  • Harder to audit. Harder to trust.
  • Example: Q&A channel from docs

The Agent Team Patterns

ORCHESTRATOR spawns in parallel AGENT A AGENT B AGENT C ⚡ APPROVAL GATE WRONG APPROACH VERIFIER (fresh eyes) FIX BUGS 👤 HUMAN GATE CHANGE REQUEST ✓ SHIP Verifier → Builder (fix bugs) Gate → Builder (wrong approach, redo) Human → Verifier (change request) Pass (move forward)

How It Works

Layer 1 — Personal AI Layer
OpenClaw messaging, memory, tools, channels
Layer 2 — Coding Orchestration
Claude Code parallel sub-agents, file I/O, verification
Layer 3 — Complex Orchestration
n8n / Custom Engines DAGs, governance, retry logic

Spec-Driven Development

The Promise

  • Write the perfect spec upfront
  • Agents execute flawlessly
  • Reproducible — same prompt, same result
  • Scalable — run it 1,000 times

The Reality

  • You can't spec what you haven't built
  • Building reveals what the spec missed
  • Human taste can't be encoded in text
  • The spec is already outdated by the time you finish writing it
This is waterfall. The spec takes longer than the build.
The product can be completed faster
than the meeting that talks about it.
Spec → Build → Review
weeks of meetings to write the spec
Build → Review → Iterate (or Throw Away)
minutes to build, then decide if it's worth keeping
Build v1. Look at it. Throw it away or iterate. The spec writes itself after you've built something real.
If you wouldn't let an unsupervised intern do it,
don't let an unsupervised agent do it.
Scope their work. Review their output. Limit their blast radius.

The Honest Take

What Works

  • Building self-contained tools fast
  • Parallel agent teams for distinct tasks
  • Agents that remember you across sessions
  • Verification agents catch real bugs before you see them

What Doesn't (Yet)

  • Codebase changes constantly. Things break between versions.
  • Agents say yes to everything, then don't follow through
  • Context disappears between channels
  • Can't handle real multi-agent orchestration

The "Yes Man" Problem

Separate verifier. Checkpoints. Humans in the loop for anything that matters.

The Opinionated Framework Problem

Opinionated = fast start, hard to customize. Unopinionated = slow start, full flexibility. Pick one.

Security & Trust

More autonomy = more risk. Design for the worst case.

Real Numbers

~12
Minutes per product
14
Total agents
3
Verified products
A team meeting to discuss building these would take longer than actually building them.

The Results

While we were talking, 14 agents built 3 products.
Let's see what they made.
☄️

NASA NEO Dashboard

✓ Built · ✓ Verified · ✓ Live data

👋

Onboarding System

✓ Built · ✓ Verified · ✓ Channel live

🔥

Incident Postmortem

✓ Built · ✓ Verified · ✓ Channel live

Know Your Tools

NeedToolWhy
Personal AI layerOpenClawMessaging, memory, channels, daily workflows
Build software fastClaude Code / CodexParallel sub-agents, file I/O, great at code
Workflow automationn8n / Make / ZapierVisual flows, webhooks, reliable triggers
Complex orchestrationLangGraph / CustomDAGs, state machines, governance, retry
Swarm intelligenceCustom enginesSelf-improving, parallel research, deep orchestration

Start Small, Scale Smart

Week 1
Crawl
Single agent, one repetitive task
Report generation, data formatting
Month 1
Walk
Agent + verifier, human review
Internal tools, dashboards
Quarter 1
Run
Agent teams, parallel builds
Multi-surface integration
Pick the most repetitive workflow first — not the hardest one.

What's Coming

What Your Org Should Do Monday

Discussion

What are you building? What's broken?
Let's talk.
GitHub: github.com/GixGosu · @brineshrimp
linkedin.com/in/joshua-burdick-25a993180