Tools, MCP and CLI-over-MCP

The whole lesson in one sentence: tools are the interface between your model and the world, and the description of a tool decides whether the agent picks it correctly at every single step.

Tools are the agent-computer interface

In lesson 1 you learned the agent is a model calling tools in a loop. This lesson is about the tools themselves, because the loop is only as good as the interface it calls.

Anthropic coined the name for that interface. In Building effective agents (Schluntz and Zhang, December 2024) they wrote: "Think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI)." Where HCI is buttons, menus, and affordances for people, the ACI is tool names, parameter descriptions, return formats, and error messages: everything the model sees when it decides which tool to use, with which arguments, and what to do with the result.

The mental model Anthropic gives you is short: you are "writing a great docstring for a junior developer." The model has the tool definition plus whatever earlier turns put in context. If the name is cryptic, the description vague, or the parameter names ambiguous, the model guesses. Guessing is the main source of agent failures in production.

Why ACI work beats prompt work

Here is the claim that should change how you spend your time. A system prompt is one static context component, written once. Tool descriptions get read again at every step and shape every tool choice in the loop. A 10% better tool description across a 30-step trajectory compounds further than a 10% better system prompt.

Anthropic's own SWE-bench case is the paradigm example. They tried prompt tuning on Claude 3.5 Sonnet for software engineering tasks. The breakthrough did not come from the prompt. It came from reworking the agent-computer interface: replacing relative filepaths with required absolute paths, fixing error formats, and using semantic file references. Their note is blunt: replacing relative paths with mandatory absolute paths returned more than prompt optimization did.

Best practices for tool design

Anthropic's September 2025 guidance is concrete. Steal it directly.

Fewer, high-signal tools. schedule_event beats list_users plus list_events plus create_event. Narrow CRUD wrappers burn context budget and raise decision uncertainty.
Namespacing. asana_search, jira_search, github_list_prs. Not only for disambiguation: Anthropic reports measurable, "non-trivial" gains in tool-use evaluations.
Semantic identifiers, not UUIDs. Hallucinations drop when the model sees accordia-consultancy instead of 6a7b32f1-4d4e-.... UUIDs are designed for machines. Slugs read for both people and models.
Poka-yoke the arguments. Make wrong calls impossible with schema constraints: enums over free strings, required fields, range validators.
Token efficiency. Pagination, truncation, a response_format: "concise" | "detailed" parameter that saves roughly a third of the tokens. Claude Code caps tool responses at 25,000 tokens.
Actionable errors. Replace stacktraces with instructions the model can follow: start_date must be within 90 days of end_date. You passed 180. Try splitting into two calls.

One more, because it points at the future. A tool-testing agent that automatically rewrote weak tool descriptions cut task completion time by 40% in Anthropic's tests. The ACI layer is itself agent-writable.

MCP: the standard ACI layer

The Model Context Protocol is Anthropic's standardization of this layer. Released November 2024, donated to the Linux Foundation in December 2025, adopted by OpenAI, Google, and Microsoft through 2025. MCP is the ACI layer Anthropic gave to the industry.

But MCP has two sub-layers, and they do not have the same portability.

Layer	What	Portable?
Wire format	JSON-RPC 2.0 between agent and server	Fully. AAIF / Linux Foundation
Config file	Which servers, where, how to start	Silo with soft convergence. No standard

You can write an MCP server and trust that any MCP client can talk to it over JSON-RPC. You cannot write one config file that every client understands. Claude Code reads .mcp.json with key mcpServers. Cursor uses the same. Codex CLI uses TOML at .codex/config.toml. VS Code uses servers, not mcpServers. Zed uses context_servers. The 2025-11-25 spec defines wire, transports, auth, tools, resources, and prompts. It deliberately does not define client config. The MCP 2026 roadmap lists "Configuration portability" under future work with no working group formed.

There is a second trap with MCP, and it is about context budget. Install too many MCP servers and the tool descriptions alone can eat 76k of a 200k window, dropping effective Sonnet context to around 100k. "Fewer, high-signal tools" is not only a quality argument. It is a survival argument for your context window.

CLI-over-MCP: if an AI can type it, it is a tool

Now the opinionated stance. Peter Steinberger, on the Lex Fridman podcast (#491), put it flatly: "Every MCP would be better as a CLI." OpenClaw is built on this. Its skills are CLI scripts, not MCP servers. The agent just runs skill-name --argument value in bash, and a SKILL.md file explains what the command does.

He does not reject MCP the protocol. JSON-RPC 2.0 is fine. He rejects it as the default way to give an agent tools, for three reasons.

Training congruence. LLMs were trained on enormous amounts of Unix command-line output: man pages, --help text, stderr messages, bash scripts in GitHub repos. A CLI call and its output are familiar information shapes to a model. An MCP blob is not. So a model that calls a CLI understands the output better and reacts to errors more effectively.
Context efficiency. MCP servers tend toward blob responses: a whole resource dump for a simple question. CLIs produce sharp, bounded output: exactly what you asked for. Since the context window is the scarce resource, smaller targeted outputs always win.
Self-healing behavior. When a CLI call fails, the model can read the --help output and fix its own call, no extra tooling required. That recovery comes naturally from the training-congruent output shape. MCP errors are JSON exceptions the model has to interpret through a protocol lens. CLI errors are human-readable and model-readable at once.

There is a fourth, quieter argument that the config section already made for you: a CLI has no config silo. No per-harness file, no cross-harness compatibility problem, just an executable.

Be honest about the limits. This is one prominent voice on one interview, confidence medium. The counter-arguments stand: MCP servers can be stateful across calls, can run asynchronously, and can offer typed tool signatures. CLI scripts are usually stateless, synchronous, and untyped text streams. The principle is clean and logically consistent, but it is not yet an industry consensus. Use it as a default bias, not a law.

Hooks: deterministic guardrails around tool calls

Tools are non-deterministic: the model decides when to call them. Sometimes you need a deterministic gate. That is what hooks are.

Hooks are event-driven shell commands that fire at specific moments in the agent loop: PreToolUse, PostToolUse, SessionStart, UserPromptSubmit, Stop, and many more. A hook can block, mutate, inject, or log a tool call. As of April 2026 they are richest in Claude Code (25+ events, four handler types: command, http, prompt, agent), copied by six other harnesses in incompatible variants, and absent in five major ones. Like permissions and MCP config, hooks are a runtime primitive that does not standardize across vendors. The MCP "Interceptors" proposal (SEP-1763) for exactly this sits as a draft, unimplemented and explicitly outside the 2026 roadmap.

Where hooks composing with permissions: permissions are a passive allowlist, hooks are active interception, precedence is deny > defer > ask > allow.

Good uses in a vault-plus-projects setup: a Pre-Edit linter that blocks on exit 2 and sends stderr back as model feedback; a SessionStart hook that injects project context; a PostToolUse secret scan on outputs. Bad uses, and this matters: a secrets scan that must pass before merge does not belong in a hook. Hooks are bypassable with --dangerously-skip-permissions. That gate belongs in CI. Claude Code hooks run shell with full user permissions and no default sandbox, and .claude/settings.json is a repo-embedded supply-chain vector. Check Point documented eight CVEs through this surface between September 2025 and February 2026. Treat hooks as guardrails inside a trusted loop, never as your trust boundary.

Takeaways

Tools are the agent-computer interface. A tool description is read every step, so it outweighs the prompt.
Steal Anthropic's rules: fewer high-signal tools, namespacing, semantic IDs, poka-yoke arguments, actionable errors.
MCP's wire format is portable, its config file is a per-harness silo. Too many servers eats your context window.
CLI-over-MCP: if an AI can type it, it is a tool. Default bias, not law.
Hooks are deterministic guardrails inside the loop, not your security boundary. That lives in CI.

Where this goes next

You now have tools, the interface to them, and the gates around them. The next question is how to package capability so the agent reaches for the right tools at the right time, and how to hand work off to a focused worker with its own context window. That is the next lesson, "Skills and subagents."

Tools are the agent-computer interface​

Why ACI work beats prompt work​

Best practices for tool design​

MCP: the standard ACI layer​

CLI-over-MCP: if an AI can type it, it is a tool​

Hooks: deterministic guardrails around tool calls​

Takeaways​

Where this goes next​