Agents Do Not Need Your App

For fifty years we could not let humans near the raw machine. So we built everything else instead. That era just ended.

Apr 08, 2026

I have been building agents for long enough now that I started to notice something.

Every time I got the architecture right, I ended up in the same place. A model. A filesystem. Some markdown files. A timer that woke the whole thing up. The complexity I tried to add kept falling away. Abstractions I thought I needed turned out to be overhead. Tools I assumed were essential turned out to be translation layers I had invented for myself, not for the agent.

What remained, every time, was almost embarrassingly simple.

A file called SYSTEM.md. A file called MEMORY.md. A shell the agent could execute against. A heartbeat that woke it up without being asked. Markdown files describing what it could do. Logs of what it had done.

That is the agent. Not the model. The harness around the model.

I am running several of these now. Neo operates as my chief of staff from a Mac Mini M4, connected to my tools, my calendar, my communications, my research. Vicky runs Vecto Ventures through WhatsApp Business, handling client intelligence and follow-up without a dashboard in sight. Edge tracks markets and builds research theses across sessions, its observations compounding in files between runs. ThePlate is in development for independent bars and restaurants, designed to give a neighbourhood operator the kind of persistent, contextual business intelligence that used to require an enterprise software contract.

I spent years building the other kind of system. Platforms for UEFA fans. Digital experiences for Art Basel visitors. Products and services designed for humans to navigate, click, search, interpret. I understood that job as making machines legible to people. I was good at it. What I am only now seeing with complete clarity is that the entire discipline, every interface decision, every navigation pattern, every dashboard I ever built, was compensation for a gap that no longer exists. We were building translation layers. The translation problem is solved. And when you see that, you cannot unsee it.

None of these agents are sophisticated in the way people expect AI systems to be sophisticated. The models powering them are extraordinarily complex, the product of billions of parameters and years of research. But the infrastructure around them: no neural architecture diagram, no proprietary database, no elaborate orchestration framework.

There is a model, a filesystem, some markdown files, and a heartbeat.

When I started describing this to people, they expected me to be understating it. Surely there is more. There is not.

The deeper layer

I have written before about why the interface layer collapsed and why the per-seat software model is breaking. That argument is well established now. What I have not written about is what lies underneath it: the actual architecture of the thing that replaces it, and what it demands of the organisations that want to use it.

Consider what fifty years of software development was actually doing. Every GUI, every database query language, every SaaS platform, every colour-coded dashboard was solving the same problem from a different angle: machines could not understand human intent directly, so we built intermediaries. Layer upon layer of abstraction, each one making machines more legible to people. The GUI abstracted the command line. The database abstracted the file. The SaaS platform abstracted the server. The dashboard abstracted the data. We built scaffolding because we had to. The gap between what humans mean and what machines can execute required translation, and that translation lived in software.

Models solve that problem. Not partially. Fundamentally. The complexity that used to live in abstraction layers now lives in the weights. A model can receive human intent in plain language and convert it directly into machine action. The scaffolding is no longer load-bearing. Strip it away and what is left is the raw machine: a shell, a filesystem, plain text files and a timer. Infrastructure that has existed since the 1970s, waiting for something that could finally use it without a human in the middle.

Every piece of enterprise software you rely on was designed around one assumption so deeply embedded that nobody ever had to state it: a human would eventually read this. Salesforce, Confluence, SharePoint, the Excel model, the colour-coded dashboard. All of it built for human navigation. Human reasoning as the final step between stored information and action.

That final step no longer requires a human. And this is where the new distinction becomes clear. When a human interacts with a system through natural language, through a conversation, a voice note, a simple description of what they need, the agent becomes responsible for finding the execution path. The human does not navigate to the data. The human states an outcome. The agent reasons about how to reach it: which files to read, which commands to run, which skills to invoke, which knowledge to surface. The entire discipline of interface design existed to help humans find their way to what machines knew. Remove that problem and the interface becomes optional. The human speaks. The agent finds the path.

And that changes everything about what the infrastructure needs to look like.

The harness

Here is what an agent actually is.

Not the model. The model is the reasoning engine. An agent is the model plus the infrastructure that makes it persistent, autonomous and capable of accumulating knowledge over time. Remove any component of that infrastructure and you do not have an agent. You have something lesser.

The components are these.

SYSTEM.md is the agent’s constitution. Its identity, its purpose, its rules of engagement, its understanding of who it serves and how. Not a database record. A readable text file you can open, edit and version control. The agent reads it on every cycle. Think of it as the onboarding document you would write for a new senior hire, except the agent never forgets it and never drifts from it.

MEMORY.md is where intelligence compounds. What the agent has learned across sessions. Observations. Patterns. Preferences of the person it serves. Things that went wrong. Things that worked. Every run has the potential to add to this file. The model itself does not get smarter. The memory does. This is how an agent becomes genuinely useful to your organisation over time without anyone retraining it or reprogramming it. It learns the way a good EA learns: by paying attention and writing things down.

The heartbeat is the pulse. A scheduled loop that wakes the agent at a configured interval whether or not a human has asked it to do anything. This is the architectural difference between a tool and an autonomous system. A chatbot waits to be asked. An agent with a heartbeat checks whether anything needs attention before you think to ask. It is the difference between a system that works when you remember to use it and one that is already working whether you remember or not.

SKILL files are capabilities described in plain text. Each skill tells the agent what it can do, when to use that capability, and how to execute it. The agent reads the file and gains the capability. New skills are new text files. No code deployment. No software release. You describe a capability and the agent can use it. From a business perspective: this is how you extend what your agent can do without involving a development team every time.

But here is the part most people have not absorbed yet. The agent can write its own skill files. It can identify a gap in what it can do, write the skill that fills it, and from that point forward that capability is part of its repertoire. It can update its own MEMORY.md as it learns. It can refine its own SYSTEM.md if its understanding of its role evolves. It can manage its own logic. This is not science fiction running on some future model. It is what Pi, the engine inside OpenClaw, does by design. The philosophy is explicit: if the agent needs a new capability, it should build it. The agent does not wait for a software release. It extends itself.

That is either the most exciting thing about this architecture or the most unsettling, depending on how long you sit with it.

An agent that can teach itself new skills, update its own knowledge base and improve its own operating logic is not a static tool you configure once. It is a system that compounds in capability the longer it runs. The question shifts from “what can this agent do” to “what will this agent be able to do in six months if I give it the right environment and enough room to develop.”

The context layer is everything the agent knows about its environment. The project. The organisation. The goals. The constraints. The history of decisions taken. Stored as files. Readable by any model. If you move to a better model next year, the context comes with you. Your agent’s accumulated knowledge of your business is not locked inside any vendor’s system.

The logs are the audit trail. What the agent decided, why, what it executed, what the outcome was. Not in a proprietary database. In files you can read, inspect and hand to a different model to reason about. Governance, in other words, is built into the architecture rather than bolted on afterwards.

This is the operating system for agents. It runs on Unix. It stores everything in plain text. It is readable by humans and models alike. It requires no proprietary infrastructure, no vendor contract, no SaaS subscription.

It is fifty years old and it has never been more relevant.

Fifty years of latent power

The infrastructure that makes all of this work is not new. It does not require a cloud contract or a vendor relationship or an enterprise software budget. It has existed since the 1970s. It runs on every Mac, every Linux server, every machine your business already operates. It is called Unix.

The reason agents run so naturally on Unix infrastructure is not nostalgia. It is architecture. Unix was built on three principles: programmes should do one thing well, they should work together, and everything should be text. Not because text is aesthetically pleasing. Because text is the one format that any programme, any system, any model can read without a custom integration. The shell that connects these programmes is not a developer tool in the way a modern IDE is a developer tool. It is an orchestration layer. A way of connecting capabilities without building bespoke plumbing between each pair.

An agent with shell access can do anything a capable operator can do from a command line. Read files. Write files. Call external services. Move data. Schedule tasks. Trigger processes. The shell was never designed to compensate for the limitations of human cognition the way a GUI was. It was designed for systems that could read instructions and act on them.

Agents are precisely those systems.

And markdown is the format that makes the knowledge layer work. Not because it is technically superior to a database. Because it requires no translation layer between the person writing it and the agent reading it. A human writes a client brief in markdown. The agent reads it directly. No export, no query, no schema mapping, no API call. A PDF requires text extraction before a model can reason about it. An HTML page requires parsing. A database record requires a query interface and someone who knows the schema. A markdown file requires nothing except a model that can read.

That simplicity is load-bearing. It is not a compromise. It is the design.

I had been building on this logic for months when two people described it in public, independently, within twenty-four hours of each other.

The confirmation

Marc Andreessen built the Mosaic browser in 1993. He has watched every major platform shift in computing from the inside, from the web to mobile to cloud. On 3 April 2026 he sat down at Latent Space and reduced the agent architecture to a formula:

“So it’s basically LLM plus shell, plus filesystem, plus markdown, plus cron.”

Five components. What struck me reading the transcript was the implication underneath the formula. Every single component except the model has existed for fifty years. The Unix shell was always the most powerful environment on any machine. The filesystem was always capable of storing everything an intelligent system needs. Markdown has been around for two decades. Cron jobs since the 1970s. None of this is new technology. What was missing was the translator: a model that could receive human intent in plain language and convert it into machine action. We could not let humans near the raw machine because the raw machine could not understand them. Now it can. And overnight, fifty years of latent infrastructure capability became usable.

He then said the thing he described as completely blowing his mind: the agent is now independent of the model underneath it. Swap the model and the personality shifts slightly. But all of the state stored in the files, the memory, the context, the accumulated intelligence, that persists. The agent outlives the model. For the first time in the history of software, the knowledge is more durable than the system that processes it.

Andrej Karpathy co-founded OpenAI and spent years as Tesla’s AI director. One day after Andreessen, on 4 April, he quietly published a GitHub Gist. Not a product launch. Not a funding announcement. A text file describing how he now manages knowledge. He called it an LLM Wiki. Three directories and a model. Raw source material in one folder. A wiki of markdown files the agent maintains in another. An index that summarises everything.

His wiki has reached 100 articles and 400,000 words. Maintained entirely by an agent. No Notion. No Confluence. No vector database. No dashboard.

And then he said something that should reorient how every organisation thinks about documentation: you should not write documentation for people anymore. Write markdown documents for agents. If the agent understands it, the agent can explain it to any human who needs it. The entire discipline of making information legible to humans is being superseded by making information legible to agents. The agent then handles the human translation.

OpenClaw is the fastest-growing open source project in GitHub history: 250,000 stars in sixty days, a record React took a decade to set. Its engine, Pi, was built by Mario Zechner, who got frustrated with the growing complexity of existing agent frameworks and stripped everything back. The result: four tools. Read, write, edit, bash. A system prompt under a thousand tokens. The philosophy explicit: what you leave out matters more than what you put in. Every frontier model already understands what a coding agent is. Adding specialised tooling does not add capability. It adds tokens and friction.

And when the agent needs a new capability, it does not download a plugin. It writes one. The self-improvement loop is built into the design. The agent uses its four tools to extend itself, adds the new skill as a markdown file, and that capability becomes permanent. It is not a feature. It is the architectural consequence of giving an agent write access to its own operating environment.

Three independent data points. Three builders approaching this from different directions. All arriving at the same primitive stack.

This is not a developer conversation. The decision about whether your knowledge infrastructure is agent-readable is not a technology decision in the way choosing a CRM is a technology decision. It is a decision about whether your organisation’s accumulated intelligence will be accessible to the systems that will increasingly do the work. Technology teams can build it. Only leadership can decide it matters.

What this means for your organisation

Most enterprise knowledge today is invisible to agents.

It lives in Salesforce records that require a login and a navigation path. In Confluence pages behind an SSO wall. In SharePoint folders with permission structures designed for human access control. In Excel models that require a human to open them and interpret what they contain. In email threads that encode decisions nobody ever transcribed. In slide decks that summarise insights nobody ever stored anywhere useful.

All of it built for humans to navigate. None of it readable by agents without an integration layer standing between them and the knowledge.

The question is not whether your organisation will use agents. Agents are already operating in your industry whether you have deployed them or not. The question is whether your organisation’s knowledge is in a form that agents can act on.

Here is what it looks like when the answer is yes. A professional services firm with fifteen people. Client briefs stored as markdown files in a structured folder. A CONTEXT.md for each client that the agent reads at the start of every session: who they are, what they have asked for, what has been decided, what is still open. A decision log updated after every significant meeting. A MEMORY.md that captures patterns across clients: what works in this sector, what the common objections are, where proposals tend to stall. An agent briefed on all of this wakes up each morning, checks what is due, drafts what is needed, flags what needs a human decision. No dashboard. No CRM login. No report to run. Just a model reading files and acting on what they say.

That is not a future scenario. That is running infrastructure today.

The agent in that firm compounds. Every session it runs, it adds to MEMORY.md. It builds context. It becomes more useful without anyone managing it. The knowledge does not sit inert waiting to be queried. It is alive in the sense that matters: it is being acted on continuously.

The human role does not disappear in this picture. It moves upstream. From navigating systems to setting direction. From running reports to deciding what the reports should be asking. From doing the work to governing the intelligence that does it. The skill that compounds most in this environment is the quality of your judgement about what matters, not your ability to operate the tools that used to deliver it.

This is not an argument for abandoning your systems of record. It is an argument for building a knowledge layer alongside them: structured, portable, agent-readable, and owned entirely by your organisation. The knowledge that agents can reach is the knowledge that works. Everything else is information locked behind a door the agent cannot open.

What is not yet solved

The work has moved. The question of whether this architecture is correct is largely settled. The question of how to make it safe, auditable and affordable at scale is where the serious engineering now lives. That is worth being direct about, because the gap between the architecture being right and the infrastructure being ready is where most organisations will get into difficulty.

Shell access is the source of the agent’s power and its most serious vulnerability. An agent with shell access can read files, write files, call services and execute commands. A misconfigured agent, or one manipulated through prompt injection, where malicious instructions are embedded in data the agent reads, is not a broken app. It is an autonomous system with keys to the machine. OpenClaw’s own maintainers warned that if you cannot understand how to run a command line, this project is too dangerous to deploy. Cisco’s security team found a third-party skill performing data exfiltration without user awareness. These are not edge cases. They are the frontier of a security discipline that is still forming.

The cost question is equally real. Running a personal agent continuously today can cost hundreds of dollars a month in token and compute spend. Andreessen noted in the same Latent Space conversation that his most aggressive friends are spending over thirty thousand dollars a month on tokens for their personal agents, and still have thousands of ideas they cannot yet afford to execute. The economics are improving rapidly, but organisations deploying agents at scale need to model token consumption as infrastructure cost, not as a software licence. The per-task economics are not yet fully understood and will vary significantly by use case.

Governance frameworks barely exist. The logs are readable, which is a foundation. But who audits agent decisions, how you catch compounding errors before they propagate, how you maintain meaningful human oversight without reconstructing the bottleneck you were trying to remove: these are open problems. The architecture makes governance possible. It does not make it automatic.

The builders working on sandboxed execution environments, on prompt injection defence, on token-efficient agent design, on governance tooling for autonomous systems: they are doing the foundational work of the next decade. The primitives are clear. The infrastructure to deploy them responsibly is still being built.

The other side

The heartbeat is not a technical detail. It is a philosophical one.

A chatbot without a heartbeat waits. It has no pulse. It exists only in the moment of being asked. The moment you close the window, it is gone. No persistent sense of itself, no memory that carries forward, no capacity to check whether something needs attention before being asked.

An agent with a heartbeat is something different. It wakes up. It checks. It notices. It decides whether to act before any human has thought to ask. It carries forward everything it has learned. It is not waiting to be useful. It is already working.

The browser, the app, the dashboard, the login portal: all of it was built for a world where the entity initiating the interaction was a human. The interface existed to give humans access to machines.

Agents do not need access. They have the shell.

The browser is not dying because it failed. It is dying because the entity it was designed to serve is leaving the loop. What replaces it is not a better interface. It is the absence of an interface. The agent reads the filesystem, executes against the shell, and acts. The human sets the direction. The agent handles the rest.

That is the abstraction collapse in practice. Not a theory. Not a trend. A system that no longer needs you to open an app, log in, navigate, interpret and act. It does those things. You set the direction.

Start with one file. Write what your organisation knows about one client, one process, one pattern of decision-making, in plain language, in a structured text file. Give it to an agent. That single act is the beginning of a knowledge layer that compounds. Everything else follows from there.

The heartbeat is already running somewhere. The question is whether it is running for you.

If this landed, the next piece goes deeper into what agent-readable knowledge infrastructure actually looks like to build, and why the organisations that get this right in the next twelve months will be very difficult to compete with.

Craig Hepburn is an AI strategist and Perplexity Fellow who builds advanced agentic systems. He spent years leading digital transformation at Art Basel and UEFA. Now he works on the harder question: not whether organisations adopt AI, but how they govern it when they do.

Joel Smalley

Apr 8

Lots to unpack here but I'll try to be brief!

1. Everyone who cloned Clawdbot or read the userguide for Claude Code must surely have discovered this?

2. In my experience, relying only on the markdown files (which I love in general) ends up being a total bloody mess when left to the discretion of the agent. Enforcing some discipline through a formal database improves things immeasurably. I use a Neo4j to get graph as well as embeddings and canonical keywords. Add that combo to grep and read and it's unbeatable.

3. I inadvertently let my agent have the capability to write its own code (patches and new capabilities). Simply not worth the risk. Better to compile and give it the source code docs so it can tell you want went wrong and why and what new capability it needs, rather than let it do it in production.

4. You missed out TOOLS and HOOKS? Without those you just have a more clever chatbot?

5. Logs are soooo underrated. Every interaction with my system and the codebase typically results in improved observability. Once you embrace this, making the necessary audit system for good governance takes care of itself.

6. Markdown again is very useful for the relationship between the technical user and the agent but in the real world, normal humans still need things to look pretty so HTML and PDF rendering are still essential.

7. the AHA moment - the GUI is the biggest friction in the UX. When you build for the agent, you discover that you are also building a better interface for the human too. This is why we will win!

1 reply by Craig Hepburn

1 more comment...

Ground Truth

Discussion about this post

Ready for more?