Agent.md as the Future of Software

· agents, product-design, business

The Idea

What if software as we know it becomes obsolete? The traditional stack—frontend frameworks, backend APIs, databases, deployment pipelines—exists because we needed to translate human intent into machine-executable instructions. But with general agents (like Claude Code) + configuration files (like CLAUDE.md), we already have:

  • Intent interpretation: Natural language → understanding
  • Execution: Skills, subagents, MCPs, or just generating code on the fly to call APIs
  • State: The conversation itself, or any file/database the agent can access
  • Interface: Text (or whatever modality the agent supports)

The provocative question: Why build React components when an agent can just render what you need? Why build APIs when an agent can call external services directly?

This isn't about replacing all software—it's about recognizing a category shift. Traditional software encodes procedures (if this, do that). Agent systems encode goals (achieve this, figure out how). The .md file becomes the product spec, and the agent IS the product.

Why This Matters

What this enables:

  • Zero-code products (the CLAUDE.md is the product configuration)
  • Instant customization (change the prompt, change the behavior)
  • Composability (agents call other agents, tools call other tools)
  • Democratized building (anyone who can write English can build)

What traditional software still does better:

  • Latency-critical paths (gaming, real-time systems)
  • Deterministic guarantees (financial transactions, safety-critical)
  • Cost at scale (inference is expensive; static code is free to execute)
  • Privacy (local-only execution, no API calls)

The interesting middle ground:

  • Hybrid systems: Agent handles complexity/ambiguity, compiled code handles hot paths
  • Agent-generated code: Agent writes the traditional software, then steps back
  • Progressive hardening: Start with agent, identify stable patterns, crystallize into code

Agent as Orchestrator (Not the Other Way Around)

Current hybrid systems have it backwards:

Compiled code (orchestrator)
  └── calls agent when ambiguity needed

The more natural pattern:

Agent (orchestrator)
  └── calls compiled code when speed/determinism needed

Agent as main thread, compiled code as optimized subroutines.

Why this makes sense:

  • Agent handles routing, error recovery, edge cases naturally
  • Compiled code becomes like assembly—drop into it for hot paths, not control flow
  • Changes to orchestration = prompt changes, not redeployment
  • Self-healing: agent can route around failures, retry differently

Why we're not there yet:

  • Latency: every orchestration decision = inference call
  • Cost: tokens for control flow that code does free
  • Debugging: "why did it do that?" harder to trace than reading code
  • Testing: non-deterministic orchestration breaks traditional test patterns

But these feel like temporary constraints, not fundamental ones. Inference gets faster/cheaper. Observability tooling catches up. Testing paradigms evolve.

Uncomfortable implication: Most "AI application frameworks" (LangChain, etc.) are building the old pattern—code orchestrates agents. They might be the jQuery of this era—useful scaffolding that gets replaced when the native platform catches up.

Products Closest to This Vision Today

Pure config → agent behavior:

  • Custom GPTs / GPT Builder - Closest to the vision but limited (chat-only, constrained tools)
  • Claude Projects - Same pattern, slightly more capable with artifacts

Agent + workspace:

  • This KB (MindCapsule) - Literally built this way. CLAUDE.md is the spec, Claude Code is the runtime
  • Notion AI / Coda AI - Workspace IS the data layer, AI manipulates it

Agent builds traditional software:

  • Replit Agent / Bolt / Lovable - Agent writes and deploys code, then steps back
  • v0 - Same pattern for UI

Agent orchestrates services:

  • Zapier Central - Natural language → agent calls integrations

The gap: most are either chat-constrained (GPTs) or code-generating (Replit). The full vision—agent as persistent runtime, config file as complete product spec—doesn't fully exist yet.

Infrastructure for Agent-as-Orchestrator

What does an agent-as-orchestrator actually need to run?

Core runtime:

  • Inference endpoint (the model)
  • Context/conversation management
  • Tool execution layer

State layer:

  • Persistent memory (beyond single conversation)
  • File system or database access
  • Context window management (summarization when full)

Integration layer:

  • API credentials to external services
  • Compiled code modules to call
  • MCP servers or equivalent protocol

Interface:

  • How users interact (chat? voice? agent-rendered UI?)
  • How agent presents output

Observability:

  • Decision logging ("why did it do that?")
  • Cost tracking
  • Debugging tools

Security:

  • Permission boundaries
  • Auth to external services
  • Sandboxing for code execution

Today's answer: "a computer running Claude Code" or a server with an agent loop. Not production-grade.

What's missing for real agent-as-orchestrator infrastructure:

  • Multi-tenant (one agent serving many users, or many instances)
  • Durability (survives restarts, handles failures)
  • Cost controls (budget limits, fallback to cheaper models)
  • Auth model (agent identity, user delegation)
  • Deployment primitives (versioning, rollback, A/B testing prompts)

Opportunity: A "Vercel for agents"—deployment/runtime layer purpose-built for agent-as-orchestrator, not agent-as-feature. The infrastructure that makes .md files deployable as products.

Market Entry Strategy (Small Team)

If this is the future, how does a small, young team enter the market today?

The trap: Building "Vercel for agents" head-on. Capital-intensive, attracts big players (Anthropic/OpenAI will do this), requires critical mass.

Strategic options:

1. Vertical-first agent product → extract platform

  • Build one agent product really well in a specific domain
  • Learn what production agents actually need by running one
  • Revenue from day one, platform comes later
  • Shopify started as a snowboard store, then extracted the platform
  • Pick a domain you know where agent value is obvious

2. Developer tools wedge

  • Build observability/debugging for agents ("why did it do that?")
  • Lower capital requirements
  • Learn what production teams struggle with, expand from there
  • Datadog path: start narrow, expand to full stack
  • Braintrust is here, but early

3. Config-to-product layer

  • Skip infrastructure, build "deploy your .md as a product" UX
  • Sit on top of existing runtimes (Claude API, GPT)
  • Like Webflow on web tech—differentiate on UX, not infra
  • Faster to market, but platform dependency risk

4. Open source runtime

  • Build the open-source agent orchestrator
  • Monetize hosting/enterprise features
  • Community moat, slower to revenue
  • Acquisition target

Recommended for young team: Vertical-first.

Pick a domain, build the agent product, be your own first infrastructure customer, then extract it. You learn what's actually needed vs. what sounds good in theory.

The infrastructure insight comes from operating, not theorizing.

Connection to Domain-Specific Agents

This thought extends the "narrow tools + general fallback" pattern. The .md file is the narrow specification; the general agent provides the execution. As patterns stabilize, they can harden into skills/tools—the software equivalent of "extract method" refactoring.

Open Questions

  • At what cost/latency threshold does traditional software win?
  • What's the UX for debugging agent behavior vs debugging code?
  • How do you version control agent behavior when the model itself changes?
  • What's the liability model when "the agent decided to do X"?

Related