AI & Strategy · May 2026 · 7 min read · External essay

Why Coase Needs Hayek

Rohit Krishnan tested three ways to organize a group of AI models. The familiar org chart, a manager splitting work among workers, lost to a simple market where models just compete.

Listen to this essay

0:00--:--

It turns out that when you give a very smart, cutting edge frontier model the job of managing other models, it costs four times as much and does worse than just letting them compete. If you have access to many models, there are three ways to get work done. You can make the smartest model a hub that routes questions to other models as it sees fit. You can have the smartest model do everything itself. Or you can run a free for all, where every model bids for the chance to take a crack at the task. A market.

To understand which one works, you do what any good scientist does. You run an experiment. The Coasean argument says that as transaction costs fall, firms unbundle into smaller pieces that then have to coordinate with each other. How will they do that? You can have some planning, or you can have markets.

So in the experiment, the hub did the thing everyone says agents are good at: split the task, delegate, red-team, revise. It spent four times as much as the market and did worse. The market, meanwhile, did the thing everyone says current agents cannot do, bid on their own competence, and it still won on cost and tied the solo model on quality.

Why did the expensive frontier planner lose to a simple market whose bidders do not even know what they are actually good at? What are the right ways to organize a group of models to get good work done?

Three ways to work

Normally there are three. You do things yourself, you delegate to others, or you let everyone pick what they want. Each one puts a different burden on the system.

Solo. The hard part is coherence. There is no benefit of diversity, but every problem gets solved through one continuous state.
Hub and spoke. The burden is decomposition. How well can you split the task, know that another model can solve each piece, and recombine the answers afterward.
Market. The hard part is allocation and retry. Do models know how much to bid, and how well? Can they?

For the experiment, Krishnan used 15 hand-written tasks, five coding, five reasoning, five synthesis, to cover the main things we want frontier model systems to do. He ran a strong model alone as the base case. Then a hub that split work into subtasks, sent those to three workers, got answers back, red-teamed, and revised. Then a market that let three models bid for each task, picked a winner, judged the answer, and updated reputation across the run.

The market averaged 7.2 out of 10 at a total cost of $1.34. Solo averaged 7.2 and cost $1.69. Hub and spoke averaged 6.7 and cost $5.33. Markets beat hierarchy here.

It depends on the task

Look at the subsets and the picture sharpens. In coding, the solo model won most tasks. These problems rewarded one continuous line of thought. A model had to hold the whole class, the edge cases, the invariants, and the exact behavior in one place. Hub and spoke helped most on the one task that naturally broke apart, a refactor where one worker could clean validation, another the discount logic, another the result assembly. The market hurt itself on code through bad routing, sending most work to a single model and leaving some runs unfilled.

Reasoning cut the other way, and the market won clearly. A brittle reasoning problem does not reward elegant decomposition. It needs independent attempts, failure detection, and retries. The right answer to the hardest problem showed up only in the market runs, because a bad first answer does not end the run when another worker can take a shot. Synthesis sat in the ambiguous middle, where framing and noticing omissions mattered, and the market showed its benefits there too.

Markets beat managers when the value of independent retry exceeds the value of orchestrated coherence.

There is a deeper oddity here. In earlier work on how models behave inside markets, they turned out to be terrible at judging their own ability and at bidding on it. They lack self knowledge. Some are overconfident, some underconfident, and none are good at predicting what it will take to solve a problem. And yet, put together, they are still useful against other setups, as long as they bring enough diversity, the ability to have a fresh model try a task after another one has failed.

Why does that diversity exist at all? Models are trained in similar ways, sometimes by the same people across companies, but the cumulative effect of training makes them different enough that they perform differently on problems that look alike.

Agents are not employees

We are used to imagining the multi-agent future as a version of our own companies, just autonomous. Hub and spoke feels normal because it looks like an org chart. Managers, workers, review, revision. It is comforting and familiar. But it does not hold, because AI agents are not like human agents.

Models are not just models anymore either. They have memories, tools, scaffolds, and execution traces. Choosing which model plus scaffold plus memory plus tool stack to use is not a trivial choice, which means delegating to the right one is not trivial either. The hub is not simply a manager. Before any worker can solve anything, the hub has to know what the subtasks are and what good recomposition looks like. Get either step wrong and the workers can each be competent while the final answer gets worse. That is exactly what happened. Hub and spoke did best only when the tasks were cleanly decomposable.

With people, markets work because each of us holds local information that a price signal can pull out. We have private lives and knowledge that cannot be easily shared. Models start each run effectively the same. They change according to their prompts, and in the agentic world those prompts change them further as they act and fill their context differently. The harnesses they prefer, the memories they write, the lookups they run, the small variations in prompts all cause their behavior to diverge over time.

Which is why markets become a real necessity once we reach continual learning, and even before that as models specialize. For now the market is the underpowered version of itself, a bartering shantytown rather than a modern city, because the agents are still bad at knowing how to bid. The results showed up despite that handicap. As everyone from the big labs to the coding tools tries to work out the best way to set this up, they could stand to learn a little from how economists think about it. Coase needs Hayek here.