The Stack Around the Agent

Why agents work in one company and produce nonsense in another.

May 18, 2026

Ken Griffin sat on a stage at Stanford last week and said two things that do not seem to fit together.

The first thing he said is the part everyone has quoted. He had spoken to Niall Ferguson the day before, who had told him that in the age of AI, humans were the horses. Griffin called that a depressing way to start the day. Then he described what he had been watching inside Citadel. Work that the firm normally assigned to teams of master’s and PhDs in finance, work that took weeks or months, was being done by AI agents in hours or days. He told the room he had gone home one Friday earlier in the year fairly depressed by it.

The clip has been everywhere this week. Musk shared it. Fortune wrote it up. Every founder newsletter has framed it the same way. AI is now coming for the highest paid knowledge work in the world. Even Citadel’s PhDs are not safe.

Then Griffin said something else.

He said the competitive moats that large companies have depended upon are going to be filled in with AI tools. The ability for small companies to take on incumbents would be higher than ever. He called it a fantasy land for entrepreneurs. He mentioned a pet insurance business as an example of what was now possible, a small operator using social media and AI image recognition to reach customers in ways that would have required enterprise teams a decade ago.

Both passages are in the same conversation. One describes elite knowledge workers being replaced by software inside the most sophisticated hedge fund in the world. The other describes small operators beating incumbents with consumer tools. The commentary this week has picked one and ignored the other.

The interesting question is how both can be true.

The reframe

The standard reading of the PhD story is that AI has crossed a capability threshold. That reading is wrong, or at least incomplete. The agents at Citadel did not become useful because the models got smarter. They became useful because of where they were running.

Citadel has been building its commodities business for 23 years. Proprietary execution infrastructure. Proprietary capital. Regulatory licences across multiple jurisdictions. Decades of trade data tagged to market conditions only the firm holds. A bench of senior researchers who have spent their careers learning when a number on a screen is wrong. Drop a frontier model into that environment and you get the productivity step change Griffin described. Drop the same model into a hedge fund without any of it and you get plausible nonsense delivered faster.

The agents did not cross a threshold. The system around them did the work. Griffin was not watching software get smarter. He was watching decades of accumulated infrastructure suddenly become usable in a new way, because there was finally a layer that could read the data, write the analysis, and run the workflow.

This is the part the news cycle has missed. The PhD story is not a story about AI. It is a story about what AI does to a firm that already owns the substrate underneath. And it is the gateway to a more interesting structural point about where value lives in the intelligence era.

Two layers, two games

There are two layers in any business that uses AI, and they are playing different games. Most of the confusion in the current commentary comes from treating them as the same.

The first layer is marketing and distribution. This is where AI is collapsing the cost of customer acquisition. A small operator with a model, an advertising account, and a good understanding of their audience can now reach the same customers that used to require enterprise budgets. The pet insurance example Griffin mentioned is one version of this game. Most of the direct to consumer brands that built on Shopify and Meta ads in the 2010s played an earlier version of it. Most of the AI native sales and marketing tools being launched this year, which sit on top of Salesforce, HubSpot, and the major email platforms, are playing another. The pattern is consistent. Find an audience that incumbents are not reaching efficiently. Build a brand and an acquisition engine using consumer technology and AI. Scale until an incumbent that owns the underlying infrastructure decides to acquire you, or partner with you in a way that absorbs your channel.

This game is real. It produces real outcomes. Some of the most interesting exits of the next five years will come from it.

There are exceptions worth naming. A small number of marketing layer firms build brand depth so distinctive that the brand itself becomes a kind of stack. Liquid Death in beverages is the cleanest recent example. The product is water in a can, indistinguishable from any other, but the brand has accumulated audience and meaning that competitors cannot replicate by spending more on advertising. These firms are rare, and they are the exception that defines the rule. Most marketing layer companies do not build that kind of depth. They build a channel, scale it, and exit.

The second layer is production and verification. This is where the work actually happens. The data that feeds the model. The model that has been adapted to the data. The infrastructure that runs the agent. The human bench that catches errors before they become losses. This is the layer where AI is making firms that already own the substrate more powerful, not less. It is where Citadel lives. It is also where Bloomberg, the major investment banks, the large pharmaceutical companies, the credit card networks, and the search and ads businesses inside Google and Meta all live. The list is not long, and the firms on it are mostly not the ones you read about in the AI press.

The two layers look identical from the outside. Both have chat interfaces. Both produce plausible output. The difference shows up when the task is hard enough that the convincing nonsense fails, and when the underlying model provider changes its terms. The marketing layer firm notices immediately. The production layer firm does not, because the model is one component inside a system it controls.

The four components

The marketing layer can be described in a single move. The production layer needs more precision. Four components, all owned by the same firm and shaped to each other over time.

The first is proprietary data the rest of the world does not have. Not data scraped from the public web. Data generated by the firm’s own operations, in formats only it can read, with context only it understands. Flexport is a clean example outside finance. The freight forwarder has been collecting supply chain movement data tagged to customs entries, container utilisation, port delays, and SKU level visibility since 2013. The firm has positioned this combination of domain expertise, proprietary data, and built in distribution as the core of its competitive position. Other AI logistics startups can train on public freight data. None of them can train on what Flexport has been collecting for over a decade. The dataset is the product.

The second is models that have been trained or adapted on that data. This does not mean training a frontier model from scratch. It means fine tuning, post training, retrieval systems, evaluation harnesses, and inference infrastructure all built around the proprietary data. Stripe has spent over a decade building Radar, its fraud detection system, on payments data from millions of businesses processing more than 1.9 trillion in annual transactions. The system is not a model running on top of Stripe. It is a model woven into the payment flow itself, retrained continuously on dispute outcomes that come back automatically as labelled data. Stripe has reported significant detection improvements from its more recent foundation model work, particularly on novel attack patterns. The model is valuable because of the data it sits on. You cannot extract one from the other.

The third is deployment infrastructure the firm controls. The agent is not a chatbot in a browser. It is a process inside a system, with its own latency requirements, its own security boundary, its own audit trail, connected to systems the firm has built. Stripe Radar runs inside the payment flow itself, scoring transactions in under 100 milliseconds. A third party fraud tool sitting on top of Stripe could not match that latency. The deployment is the moat as much as the model.

The fourth is a human verification bench. Senior people who have spent fifteen or twenty years inside the firm and can look at an agent’s output and know, in seconds, whether it is correct or whether it is convincing nonsense. In Citadel’s case, the bench is likely the senior PhDs whose role is changing as the production work shifts. In a pharmaceutical company running AI assisted drug discovery, it is the senior chemists who have spent careers watching molecules fail at phase two. In a law firm, it is the partners who can spot when a contract draft has invented a precedent. Verifiers are not interchangeable with consultants. They are institutional memory translated into judgement, and they take years to develop.

The list of firms that have all four is short. Most of what is called an AI company has one or two. The integrated position is rarer than the commentary admits, and that scarcity is precisely why it is durable.

A reader paying close attention will ask where the AI labs themselves sit in this picture. OpenAI, Anthropic, Google. They do not own proprietary customer operations data in the way Citadel or Stripe does. They sit at a different position altogether, closer to the substrate layer that produces capacity for everyone else. The model providers are powerful, but their power comes from a different source than the integrated stack firms, and the dynamics are different. That distinction is worth holding lightly, because the question of how the substrate, integrated, and application layers interact is what determines who captures value as the field matures.

What an agent is, inside a stack

The piece has used the word agent throughout without saying what one is. That is worth pausing on, because the conventional definition is doing damage to how most firms think about this.

An agent in the wrapper sense is a piece of software that takes a prompt, decides what to do, and produces an output. It is a thing you can buy, install, point at a problem, and run. The agent is the same whether it sits inside Citadel, inside a startup, or inside a procurement department in a company that does not yet know what to do with it. The agent is the product.

An agent inside an integrated stack is something different. It is a process with authority to act inside systems the firm controls. It reads data nobody else can read. It runs models adapted to that data. It executes inside infrastructure the firm owns. It operates under verification by people the firm has spent years training. The agent code may be similar to what runs in a wrapper. The agent is not the same thing.

This is the move most current discussion of agents misses. The agent is not constituted by its model weights. It is constituted by its surroundings. Lift the same agent code out of an integrated stack and run it inside a procurement department, and you do not get a smaller version of the original agent. You get a different kind of thing entirely. Less useful. Less safe. More likely to produce convincing nonsense because the corrections the original agent depended on are no longer there.

This reframes what the humans in the stack are. They are not competitors to the agent. They are part of its architecture. The senior researchers who tune the model are the agent’s training. The data engineers are the agent’s senses. The verifier bench is the agent’s ground truth. The deployment team is the agent’s body. The compliance and legal teams are the agent’s permissions system. Remove any one of these layers and the agent becomes something less. Not less productive. Less coherent.

This is why the discourse about agents replacing humans gets the picture upside down. The agent that works at Citadel is not replacing the PhDs. The agent is partly made of them, in a structural sense, because what the PhDs know is what allows the agent to be trusted at all. Strip the humans out and you do not have a more efficient firm. You have a less useful agent.

The implication for anyone building in this space is direct. The work is not the deployment of agents. The work is the architecture of the firm around them, so that the agents you deploy become the kind of agents that actually function. That is what an integrated stack is. It is the answer.

How integrated stacks actually get built

Most firms trying to build into the production layer get the sequence wrong. They start with the model. They hire AI engineers, fine tune something on whatever data they can find, build an inference pipeline, and then look around for problems the system can solve. This is the order that the AI labs themselves follow, and it works for them because their product is the model. For everyone else, it leads to systems that look impressive in demos and fall apart in production.

The right sequence is the opposite. Data first. Models adapted to the data come second. Deployment infrastructure third. The verification bench is fourth, but it has to be cultivated from the beginning, because it cannot be hired in once the rest of the stack exists.

This is why new ventures and small to medium sized companies have a structural advantage over large incumbents on the production layer, despite all the commentary about incumbent moats. A founder building a venture from scratch can design the data collection from day one, can choose what to instrument, can hire the first senior judgement holders before there are any agents to verify, and can shape the deployment to the verification process rather than the other way around. A large incumbent has the opposite problem. It has accumulated data over decades in formats nobody designed to be machine readable, it has thousands of people whose roles assume the old way of working, and it has compliance and procurement processes that make it difficult to instrument anything quickly.

The firms that get this right tend to share a posture. They treat data collection as a strategic priority rather than an operations cost. They invest in verification before they need it. They are honest with themselves about which components they own and which they are renting. They accept that the production layer game is slower and harder to fund, and they refuse to confuse activity at the marketing layer with progress at the production layer. Most importantly, they have leadership that understands the company will need to be redesigned from the inside, not retrofitted from the outside.

This last point is where culture matters as much as architecture. An integrated stack is more than a technology stack. It is a way of organising a company around the assumption that judgement is the scarce resource, not effort.

What this means for you

There is a simple test you can run on whatever AI product you are building or buying. Ask whether it would still work, at roughly the same quality, if OpenAI doubled their API price tomorrow. If Anthropic restricted your usage. If Google shipped a feature that did the same thing for free inside an existing product. If your answer is that the product would survive any of those moves with minor adjustments, you are closer to the production layer. If your answer is that you would have to lay people off, raise prices, or shut a product line, you are at the marketing layer.

Most firms are at the marketing layer. That is not a failing. It is a category, and the strategies that win each layer are different.

If you are at the marketing layer, your work is to build the channel, the brand, and the audience before someone larger copies you. You are not building a forever company unless you are one of the rare brand led exceptions. You are building a wedge into a larger stack, and the exit is the strategy. Move quickly. Build the audience. Find the integrated firm that will pay for your channel because they own the underneath.

If you are at the production layer, your work is the four components. Walk through them honestly. The data nobody else has. The model adapted to that data. The deployment surface you control. The bench that can tell when the output is wrong. If you can find all four, the work ahead is to keep improving them. If you can find three, you have time to build the fourth. If you can find two or fewer, you have a choice to make about which layer you are actually playing on.

The trap is being unsure. A marketing layer firm that thinks it is a production layer firm overinvests in technology it cannot sustain and underinvests in the channel work that is its actual moat. A production layer firm that thinks it is a marketing layer firm tries to move at startup speed and loses the verification discipline that made its agents useful in the first place. Both failures are common right now. Both are correctable, but only by leaders honest enough to name which game they are in.

The horses

Griffin’s two passages are not in tension. They describe the same shift from different sides. AI is changing what each layer is worth. The marketing layer is becoming faster, cheaper, and more accessible to small operators than it has ever been. The production layer is becoming more concentrated, more valuable, and more dependent on a combination of components that take years to assemble.

The agent on the screen looks the same regardless of what sits behind it. The reader cannot see the four components from the outside, and neither can the agent’s user. The difference shows up later, when something hard arrives and the system either holds or does not. That is what the audit is for. Not to predict the outcome, but to know which system you are inside before the test arrives.

Walk through your firm. Name what you own. Name what you rent. Decide which layer you want to be at in three years. Then build the company that work requires.

A last thought on Ferguson’s image of the horses, since it is where Griffin began. The framing is honest about something real. Things are changing faster than most institutions are prepared for, and the language of replacement is not wholly invented. But the framing also assumes humans and agents are in the same race, running the same track, toward the same finish line. The work in this piece has been to suggest that picture is too simple. Agents do not arrive into a firm as competitors. They arrive as processes that need a stack to be useful, and the stack is partly made of the people inside it. The senior researcher whose judgement trains the model, the engineer who builds the data pipeline, the partner who can spot the invented precedent, the chemist who has watched molecules fail at phase two. These are not the horses. They are part of what the agent has become.

The work ahead is not whether to be on the track or off it. The work ahead is what kind of architecture you are building around the agents that will run inside your firm. That is not something the agent can decide for you. It is the work that remains human, and it is the work that matters most. The firms that take it seriously will have agents that can be trusted. The firms that do not will have agents that produce convincing nonsense, faster.

The audit is where that work begins.

If this resonated, subscribe. The next piece looks at the application layer, where most of the value of AI will be created, and where the question of which firms can build their own stack and which will forever rent one becomes the question that shapes the next decade.

Craig Hepburn is an AI strategist and Perplexity Fellow. Twenty years building at the frontier of digital, from Microsoft and Nokia to Art Basel and UEFA. Now building at the frontier of agentic intelligence.

Ground Truth

Discussion about this post

Ready for more?