← All posts

MCP Tool Design for AI Agents, Not API Endpoints

AI agents do not need your whole API

This is an AI agent infrastructure problem, not a productivity-tools rant.

If you are building MCP servers, LLM agent tools, function calling layers, or internal AI agent infrastructure, the tool list is part of your product.

Most Model Context Protocol servers are designed by looking at the API.

The API has users.create, so the MCP server gets a create_user tool. The API has users.update, so it gets update_user. Then delete_user, list_users, create_invoice, update_invoice, query_database, append_block, upload_file.

It feels honest.

It is expensive for LLM agents.

An MCP tool is not just a capability. It is prompt surface: name, description, arguments, schema, examples, edge cases, and all the warnings we add because LLM agents otherwise call things wrong.

Connect one server with twenty tools and maybe you survive.

Connect ten AI agent integrations like that and you have two hundred tools. Before the model helps you, it is already carrying a catalog of things it might never call.

Then people say the model got worse.

Maybe you filled its working memory with your integration design.

Tool lists are LLM context windows in disguise

The context window does not only hold chat history. It also holds the available interface.

Every enabled tool teaches the model something: what exists, how to call it, what shape the arguments take, what the response might mean. Useful when the tool is relevant. Waste when it is not.

The naive MCP pattern treats every endpoint as if it deserves first-class space in the model’s head.

It usually does not.

CRUD is not a tool boundary. It is an implementation detail.

Entities are not tool boundaries either.

Your model does not need create_customer, update_customer, delete_customer, list_customers, create_invoice, update_invoice, delete_invoice, list_invoices, and the same pattern repeated until the tool list becomes an API reference manual.

The model needs a way to say what it wants.

The server needs to own the rest.

This is the same pressure you see in function calling, tool use benchmarks, and agentic workflows. Tool selection is not free. Parameter schema reasoning is not free. Every operation you expose directly becomes another thing the model has to route around.

This is not just theory. The MCPToolBench++ paper describes the same bottleneck in AI agent tool use: tool descriptions and parameter schemas take long token lengths, which limits how many MCP tools an LLM can process in a single run.

Use one execution tool for agent actions

The interface I keep coming back to is boring:

{
  "operation": "customer.create",
  "payload": {
    "name": "Acme",
    "status": "lead"
  }
}

One tool executes operations by name. Call it execute, run, dispatch, whatever. The name is not the point.

The point is that the public MCP server surface stays small while the operation set grows behind it.

Today you support ten operations. Tomorrow you support a hundred. The model still sees the same execution tool. New capabilities become entries in a server-side operation registry, not new top-level tools fighting for context.

The registry holds the real complexity:

  • operation name;
  • input schema and examples;
  • safety flags and validation rules;
  • batching, response shaping, rollback.

This is where the details belong.

Not in every model turn.

Use one schema tool for agent planning

The obvious objection is fair:

If the model only sees one execution tool, how does it know the payload shape?

Give it a second tool:

{
  "operation": "customer.create"
}

describe returns the schema, examples, and constraints for one operation.

Not all operations.

One.

That distinction matters for LLM context management. The model should pull schema when it needs schema, not preload the entire product surface before knowing the task.

For simple calls, it can try execute directly and learn from a good validation error. For complex calls, it can ask describe first.

This is not hiding the API from the model.

It is pacing the API.

Errors should repair the next call

Once complexity moves behind an operation registry, errors become part of the interface.

A bad error says:

{ "error": "Invalid payload" }

A worse error dumps the entire schema for the whole operation family.

A useful error points at the failing path and gives the smallest repair:

{
  "code": "validation_error",
  "path": "payload.status",
  "message": "Expected one of: lead, active, archived",
  "fix": "Use status: \"lead\" for a new prospect."
}

The model does not need a lecture.

It needs enough information to make the next call correct.

That is the job of an MCP server: reduce the useless turns between intent and action.

Responses are part of the contract

Tool schemas are not the only context leak. Raw responses are another one.

If your server sends back every default flag, timestamp, empty field, nested SDK object, and transport detail, that output becomes the next input. You changed what the model has to reason over next.

So shape responses for agents:

  • omit default false/null noise;
  • flatten common read shapes;
  • return IDs and summaries when that is enough;
  • make verbose mode explicit;
  • keep raw SDK output as an escape hatch, not the default.

The model should spend context on the user’s task, not on your vendor’s JSON habits.

This scales better than endpoint-shaped MCP tools

The two-tool pattern is not magic.

You still need a real operation registry. You still need validation, permissions, batching, rate limits, idempotency, and careful response design.

But it scales in the dimension that matters for AI agents.

Adding operations does not expand the MCP tool list.

Adding entities does not expand the MCP tool list.

Adding optional fields does not force every conversation to carry another schema blob.

The model keeps two stable affordances:

  • execute an operation;
  • describe an operation.

Everything else is server-side interface design.

The principle

MCP is not just a plugin protocol for AI tools.

It is LLM infrastructure. It is a context protocol.

If you expose twenty tools because your REST API has twenty endpoints, you are making the model pay for your backend architecture. If you connect ten MCP servers built that way, the tool surface becomes its own context problem.

The better MCP server is not the one with the longest tool list.

It is the one that lets the model carry the smallest useful interface while still reaching the full system behind it.

One tool to do the work.

One tool to ask for the shape.

Everything else belongs behind the server.

That is the approach I used in my Notion MCP Server GitHub repository: dozens of operations, two public tools. The package is published as notion-mcp-server on npm.

Not because Notion is special.

Because the pattern applies to any MCP server that would otherwise turn into a schema dump with a transport layer.