Agents vs workflows, deeper

The whole lesson in one sentence: agency is a dial, not a switch. As you turn it up, the system gets more capable and more expensive, and Anthropic's advice is to turn it up only as far as the problem actually demands.

The four-rung ladder

Lesson 1 gave you a funnel: default to a workflow, build an agent only when four gates pass. That tells you whether to cross the line. It does not tell you where you are on the road that leads to it. Barry Zhang of Anthropic sketches that road as four rungs.

Single model call. Summarization, classification, extraction. Table stakes in 2026. One input, one output, no orchestration.
Workflow. Several model calls in a predefined control flow. You map the decision tree. You trade some cost and latency for better performance, and every node stays explicit.
Agent. The model in a loop with tools, choosing its own trajectory from environment feedback. You stop drawing the path.
Multi-agent. Several agents coordinating. Generalist agents, agent-to-agent collaboration. Zhang's own framing here is honest: too early to name what this settles into.

The point of the ladder is not that rung four is the destination. It is that each rung is a deliberate step, and you should be able to say why you climbed.

The curve that rises with agency

Here is the sentence from Zhang's talk worth memorizing:

"The broad trend here is that as we give these systems a lot more agency, they become more useful and more capable. But as a result, the cost, the latency, the consequences of errors also go up."

Two lines move together as you climb the ladder. Capability goes up. So does the bill, in three currencies: tokens, wall-clock time, and the blast radius when it gets something wrong.

A workflow has bounded, predictable cost because you defined every branch. An agent has variable, often higher cost because it decides how hard to think. A workflow's errors are lower-consequence: each node is explicit, so a failure has a name and a place. An agent's autonomy is exactly what makes unintended actions possible. You are not choosing more capability for free. You are buying it, and the receipt has three columns.

This is also why the four gates from lesson 1 are gates and not suggestions. Gate two (value) exists because exploration costs tokens: Zhang's example of high-volume support at roughly $0.10 per task, about 30 to 50k tokens, is simply too thin a budget for an agent to explore in. Gate four (catchable errors) exists because the consequence column gets taller as you climb. The funnel is the curve, read as a checklist.

Anthropic actually prefers workflows

This is the part most people get backwards. The company shipping the agents is not telling you to build agents. From the same talk:

"If you have read the blog post, you'll know that we talked a lot about workflows and that's because we really like them and they're a great concrete way to deliver values today."

Read that twice. Workflows are not the beginner tier you graduate out of. They are, in Anthropic's framing, often the right answer: the way to "deliver values today" for a decision tree you can actually map. An agent is reserved for complex and valuable work in a genuinely ambiguous problem space. If you can draw the full decision tree, drawing it explicitly is cheaper and gives you more control. Building the tree is not a consolation prize. It is the better engineering.

The practical move that falls out of this: you do not have to pick a single rung for the whole system. A workflow can carry one local agent component for a specific ambiguous sub-task, while the rest of the control flow stays predefined and bounded. You climb the ladder for the part of the problem that needs it, and leave the rest on solid ground.

Why coding is the cleanest case

The reason coding agents work so well is not that coding is special. It is that coding passes all four gates at once, and one gate in particular. Going from a design doc to a pull request is ambiguous and complex, so gate one passes. Developers agree good code has value, so gate two passes. Claude is demonstrably capable at the sub-skills, so gate three passes. And gate four, the hard one, is handed to you for free: unit tests and CI make the output easily verifiable. The error is cheap to discover. Turn the agency dial up, and a bad trajectory gets caught by a red test instead of shipping to production.

That last point connects to where the discipline is heading. Matt Pocock's line: "The quality of your feedback loops influences how good your AI can code. That is the ceiling." Coding agents are good because the feedback loop already exists. For any other domain, the question before you build is whether you can manufacture a feedback loop that catches errors as cheaply as a failing test does. Andrej Karpathy frames the same intuition from the model side: figure out which RL circuits your task lives in. Inside them, the agent flies. Outside, it is pulling teeth, and more agency just buys you more expensive failure.

Where it is heading

Rung four, multi-agent, is real but unsettled. Several agents coordinating in parallel, with scope and merge discipline, is already practiced. Karpathy's hire-test imagines pointing ten attacking agents at a junior's project to try to break it. But the honest position, and Anthropic's, is that we do not yet know whether the future is generalist agents, structured multi-agent collaboration, or something else. What does not change as the rungs get taller: agency stays a cost you pay, not a feature you turn on by default. The dial only earns its place when the cheaper rung genuinely cannot do the job.

Takeaways

Agency is a ladder: single call, workflow, agent, multi-agent. Know which rung you are on and why you climbed.
As agency rises, capability rises and so do cost, latency, and the consequence of errors. You buy capability, you do not get it free.
Anthropic prefers workflows on purpose. They are often the right answer, not the beginner tier.
A workflow can hold one local agent for an ambiguous sub-task. Climb the ladder only for the part that needs it.
Coding is the clean case because tests and CI make errors cheap to catch. For other domains, build that feedback loop first.

Where this goes next

The next lesson, "The bigger picture," steps back from the single decision and looks at the whole landscape: where single calls, workflows, agents, and multi-agent systems sit together, and how agentic engineering as a discipline holds the quality bar while the agency dial keeps climbing.

The four-rung ladder​

The curve that rises with agency​

Anthropic actually prefers workflows​

Why coding is the cleanest case​

Where it is heading​

Takeaways​

Where this goes next​