Agent.md as the Future of Software

The Idea

What if software as we know it becomes obsolete? The traditional stack—frontend frameworks, backend APIs, databases, deployment pipelines—exists because we needed to translate human intent into machine-executable instructions. But with general agents (like Claude Code) + configuration files (like CLAUDE.md), we already have:

Intent interpretation: Natural language → understanding
Execution: Skills, subagents, MCPs, or just generating code on the fly to call APIs
State: The conversation itself, or any file/database the agent can access
Interface: Text (or whatever modality the agent supports)

The provocative question: Why build React components when an agent can just render what you need? Why build APIs when an agent can call external services directly?

This isn't about replacing all software—it's about recognizing a category shift. Traditional software encodes procedures (if this, do that). Agent systems encode goals (achieve this, figure out how). The .md file becomes the product spec, and the agent IS the product.

Why This Matters

What this enables:

Zero-code products (the CLAUDE.md is the product configuration)
Instant customization (change the prompt, change the behavior)
Composability (agents call other agents, tools call other tools)
Democratized building (anyone who can write English can build)

What traditional software still does better:

Latency-critical paths (gaming, real-time systems)
Deterministic guarantees (financial transactions, safety-critical)
Cost at scale (inference is expensive; static code is free to execute)
Privacy (local-only execution, no API calls)

The interesting middle ground:

Hybrid systems: Agent handles complexity/ambiguity, compiled code handles hot paths
Agent-generated code: Agent writes the traditional software, then steps back
Progressive hardening: Start with agent, identify stable patterns, crystallize into code

Agent as Orchestrator (Not the Other Way Around)

Current hybrid systems have it backwards:

Compiled code (orchestrator)
  └── calls agent when ambiguity needed

The more natural pattern:

Agent (orchestrator)
  └── calls compiled code when speed/determinism needed

Agent as main thread, compiled code as optimized subroutines.

Why this makes sense:

Agent handles routing, error recovery, edge cases naturally
Compiled code becomes like assembly—drop into it for hot paths, not control flow
Changes to orchestration = prompt changes, not redeployment
Self-healing: agent can route around failures, retry differently

Why we're not there yet:

Latency: every orchestration decision = inference call
Cost: tokens for control flow that code does free
Debugging: "why did it do that?" harder to trace than reading code
Testing: non-deterministic orchestration breaks traditional test patterns

But these feel like temporary constraints, not fundamental ones. Inference gets faster/cheaper. Observability tooling catches up. Testing paradigms evolve.

Uncomfortable implication: Most "AI application frameworks" (LangChain, etc.) are building the old pattern—code orchestrates agents. They might be the jQuery of this era—useful scaffolding that gets replaced when the native platform catches up.

Products Closest to This Vision Today

Pure config → agent behavior:

Custom GPTs / GPT Builder - Closest to the vision but limited (chat-only, constrained tools)
Claude Projects - Same pattern, slightly more capable with artifacts

Agent + workspace:

This KB (MindCapsule) - Literally built this way. CLAUDE.md is the spec, Claude Code is the runtime
Notion AI / Coda AI - Workspace IS the data layer, AI manipulates it

Agent builds traditional software:

Replit Agent / Bolt / Lovable - Agent writes and deploys code, then steps back
v0 - Same pattern for UI

Agent orchestrates services:

Zapier Central - Natural language → agent calls integrations

The gap: most are either chat-constrained (GPTs) or code-generating (Replit). The full vision—agent as persistent runtime, config file as complete product spec—doesn't fully exist yet.

Infrastructure for Agent-as-Orchestrator

What does an agent-as-orchestrator actually need to run?

Core runtime:

Inference endpoint (the model)
Context/conversation management
Tool execution layer

State layer:

Persistent memory (beyond single conversation)
File system or database access
Context window management (summarization when full)

Integration layer:

API credentials to external services
Compiled code modules to call
MCP servers or equivalent protocol

Interface:

How users interact (chat? voice? agent-rendered UI?)
How agent presents output

Observability:

Decision logging ("why did it do that?")
Cost tracking
Debugging tools

Security:

Permission boundaries
Auth to external services
Sandboxing for code execution

Today's answer: "a computer running Claude Code" or a server with an agent loop. Not production-grade.

What's missing for real agent-as-orchestrator infrastructure:

Multi-tenant (one agent serving many users, or many instances)
Durability (survives restarts, handles failures)
Cost controls (budget limits, fallback to cheaper models)
Auth model (agent identity, user delegation)
Deployment primitives (versioning, rollback, A/B testing prompts)

Opportunity: A "Vercel for agents"—deployment/runtime layer purpose-built for agent-as-orchestrator, not agent-as-feature. The infrastructure that makes .md files deployable as products.

Market Entry Strategy (Small Team)

If this is the future, how does a small, young team enter the market today?

The trap: Building "Vercel for agents" head-on. Capital-intensive, attracts big players (Anthropic/OpenAI will do this), requires critical mass.

Strategic options:

1. Vertical-first agent product → extract platform

Build one agent product really well in a specific domain
Learn what production agents actually need by running one
Revenue from day one, platform comes later
Shopify started as a snowboard store, then extracted the platform
Pick a domain you know where agent value is obvious

2. Developer tools wedge

Build observability/debugging for agents ("why did it do that?")
Lower capital requirements
Learn what production teams struggle with, expand from there
Datadog path: start narrow, expand to full stack
Braintrust is here, but early

3. Config-to-product layer

Skip infrastructure, build "deploy your .md as a product" UX
Sit on top of existing runtimes (Claude API, GPT)
Like Webflow on web tech—differentiate on UX, not infra
Faster to market, but platform dependency risk

4. Open source runtime

Build the open-source agent orchestrator
Monetize hosting/enterprise features
Community moat, slower to revenue
Acquisition target

Recommended for young team: Vertical-first.

Pick a domain, build the agent product, be your own first infrastructure customer, then extract it. You learn what's actually needed vs. what sounds good in theory.

The infrastructure insight comes from operating, not theorizing.

Connection to Domain-Specific Agents

This thought extends the "narrow tools + general fallback" pattern. The .md file is the narrow specification; the general agent provides the execution. As patterns stabilize, they can harden into skills/tools—the software equivalent of "extract method" refactoring.

Open Questions

At what cost/latency threshold does traditional software win?
What's the UX for debugging agent behavior vs debugging code?
How do you version control agent behavior when the model itself changes?
What's the liability model when "the agent decided to do X"?

Domain-Specific Agents Over General-Purpose - Narrow tools + general fallback
Inference-Bridged Workflows - Constrain the what, free the how
Claude Agent SDK Study - The architecture that enables this vision
Coding Agents and Complexity Budgets - Why code-as-content works for agents
How Agents Learn From Mistakes - Recovery, rehearsal, and learning loops for agents
Dynamic Context Discovery in Cursor - Files as universal primitive for agent memory