MCP's Missing Layer: How Should Tool Servers Talk to Each Other?MCP 的缺失层:工具服务器之间应该如何通信?

2026-03-20 mcp a2a agent-architecture protocols bridge-rpc
Author Stance (high)
MCP deliberately omits server-to-server communication — this is a design choice, not a bug. The ecosystem has responded with five distinct patterns, none yet dominant. The right answer depends on your deployment constraint: single-machine bridge for co-located systems, A2A for distributed ones.

The Model Context Protocol has been out for over a year. Thousands of MCP servers exist. And yet there's a gap so fundamental that an entire sub-ecosystem has emerged to work around it: MCP servers cannot talk to each other.

This isn't an oversight. It's a deliberate architectural choice — and understanding why reveals something deeper about where the agent ecosystem is heading.

The Architecture, As Specified

MCP defines a strict client-server relationship. A Host (the LLM application) creates one Client per Server. Communication flows vertically:


Host (LLM)
  ├── Client 1 ↔ Server A
  ├── Client 2 ↔ Server B
  └── Client 3 ↔ Server C

Server A cannot call Server B. There is no horizontal channel. If Server A produces data that Server B needs, it must flow: A → Client 1 → Host (LLM) → Client 2 → B. The LLM is the sole mediator.

This means every cross-server data transfer:

For simple tool composition — search something, then store the result — this works fine. The LLM reads the search output and naturally feeds it into the storage call.

For anything more complex, it breaks down.

Five Patterns the Ecosystem Invented

The responses to this gap cluster into five patterns:

1. Let the LLM Mediate (Accept the Cost)

Projects: mcp-agent (LastMile AI), most OpenClaw skills

The simplest approach: don't fight the architecture. The LLM reads output from Server A, decides to call Server B, passes the data along. mcp-agent adds workflow patterns on top (orchestrator, map-reduce, evaluator-optimizer) and optional Temporal integration for durable execution.

When this works: Tool composition where the LLM's judgment is actually needed between steps. "Search for X, then analyze the results and decide what to store" — you want the LLM in the loop.

When this breaks: Credential passing (secrets in LLM context = security disaster), long-running pipelines (hours of execution across hundreds of turns), high-frequency state sharing (sub-second coordination). CA-MCP (Jayanti & Han, 2026) measured the overhead: 5 LLM calls per workflow in standard MCP vs. 2 in their optimized version — a 60% reduction, with execution time dropping from 42s to 13.5s.

2. Put Everything in One Server (Sidestep the Problem)

Projects: Agent-MCP, Network-AI

If servers can't talk to each other, don't have multiple servers. Agent-MCP is a single MCP server that internally manages multiple LLM "agents" sharing a knowledge graph. Network-AI implements a mutex-protected shared blackboard with propose-validate-commit semantics, all within one server process.

When this works: Self-contained domains where all tools naturally belong together.

When this breaks: When you want modularity. A monolithic MCP server that does vault + memory + crypto + audit is an engineering nightmare. You lose independent deployment, independent failure, independent development. You've reinvented the monolith.

3. Add an External Coordination Layer

Projects: Solace Agent Mesh, SwarmClaw, Microsoft Agent Framework

Introduce something outside MCP to handle coordination: a message broker (Solace), a task board (SwarmClaw), or a workflow engine with typed message passing (Microsoft Agent Framework's graph-based workflows with BSP supersteps).

Microsoft's approach is the most sophisticated: agents can both consume and expose MCP servers, communicate through workflow edges with shared scoped state, and fall back to A2A for cross-runtime communication. State updates are queued and applied at superstep boundaries for consistency.

When this works: Enterprise deployments with existing infrastructure (Kubernetes, message brokers, distributed state stores).

When this breaks: When you want zero external dependencies. Solace needs a broker. Microsoft Agent Framework targets Azure. Temporal (used by mcp-agent) is a separate distributed system to operate.

4. Define a New Protocol Alongside MCP

Projects: Google A2A (v1.0, March 2026), ANP (W3C Community Group)

The industry's formal answer: MCP handles vertical (agent→tool), a separate protocol handles horizontal (agent↔agent).

A2A introduces three primitives that MCP lacks: Agent Cards (capability discovery at /.well-known/agent.json), Tasks (stateful work units with a lifecycle: submitted→working→input-required→completed), and Messages (multi-part communication within tasks). The input-required state is critical — it enables multi-turn negotiation between agents, something impossible in MCP's request-response model.

ACP (IBM) merged into A2A in September 2025. ANP sits below both, handling decentralized identity (W3C DIDs) and protocol negotiation for the open internet — A2A for enterprises, ANP for the agent web at large.

When this works: Cross-organization, cross-runtime agent collaboration.

When this breaks: When your servers are co-located in one machine and you need sub-millisecond latency. A2A is HTTP/JSON-RPC — perfect for distributed systems, overkill for processes sharing a filesystem.

5. Single-Process Bridge (Direct IPC)

Projects: claude-ext (production), claude-ipc-mcp

A main process runs a Unix socket server. MCP server child processes connect to it. Any server can call any handler registered by any extension — vault, memory, session management, other extensions' capabilities — through a single socket with line-delimited JSON. Sub-millisecond latency, zero external dependencies, deterministic routing.


MCP Server A ──bridge.sock──► Main Process ◄──bridge.sock── MCP Server B
                                 │
                          Handler Chain
                    (vault, memory, crypto, team, ...)

The handler chain pattern means extensions register their methods independently. Server A calls vault_get without knowing or importing the vault extension — it just sends a method name string. The main process dispatches to the first handler that responds.

Additionally, dispatch() lets the main process itself call the same handler chain without going through the socket — enabling in-process extension-to-extension coordination with the same decoupled interface.

When this works: Single-machine deployments where all MCP servers are co-located, especially when you need infrastructure-level resource sharing (cryptographic keys, database connections, session management) that must not transit through the LLM's context.

When this breaks: Distributed deployments. Single point of failure. Non-standard (no one else implements this protocol).

The Security Dimension No One Talks About

The cross-server communication gap has a security shadow that most discussions ignore.

Academic research (Zhao et al., 2025; Sun et al., 2025; Gaire et al., 2025) has documented 12 attack categories against MCP, with deeply concerning findings:

Here's the uncomfortable implication: any shared context mechanism without access control is immediately exploitable. CA-MCP's Shared Context Store proposal has no security model. The LLM-mediated pattern inherently exposes all cross-server data to every server in the context window.

The bridge pattern addresses this differently: the main process is the trust boundary. It can enforce per-extension access policies on handler calls. A crypto extension's bridge call to vault_get can be permitted while blocking a blog extension from the same call. This isn't defense-in-depth theater — it's a real architectural advantage of centralized mediation.

Why MCP Won't Add Server-to-Server

The MCP 2026 roadmap (published by the Agentic AI Foundation, which includes Anthropic, Google, Microsoft, and OpenAI) has four priorities: Transport Evolution, Agent Communication, Governance, Enterprise Readiness.

None include cross-server state sharing.

"Agent Communication" focuses on the Tasks primitive (SEP-1686) — a call-now/fetch-later pattern for long-running operations. Transport Evolution targets stateless horizontal scaling and server discovery via .well-known metadata. The specification is explicitly moving toward stateless servers.

This is the opposite direction from cross-server shared state. And it's deliberate: stateless servers are easier to scale, easier to secure, easier to reason about. The MCP spec team has chosen not to solve this problem — they're leaving it to the layers above (A2A) and the implementations below (framework-specific IPC).

One Scenario, Five Patterns

To make the comparison concrete: an agent needs to search the web, store the result in long-term memory, and sign a transaction using a private key — three MCP servers (search, memory, vault).

Pattern 1 (LLM-mediated): The LLM calls the search server, reads the output, decides what to remember, calls the memory server, reads the result, decides to sign, calls the vault server. Three LLM round-trips. The LLM sees the search results and can summarize before storing — useful. But the private key's signing parameters also flow through the LLM's context window — dangerous.

Pattern 2 (Single server): All three capabilities are in one MCP server. Internal function calls, no LLM mediation. Fast, but the search library, the database driver, and the cryptographic signing code all share one process. A vulnerability in the search dependency can reach the key material.

Pattern 3 (External coordination): A workflow engine orchestrates the sequence. The search result goes to a message queue, the memory server reads from the queue, the vault server gets a task. Robust and auditable, but you now operate a message broker alongside your agent.

Pattern 4 (A2A): Each server is an A2A agent with an Agent Card. The search agent creates a Task, the memory agent picks it up, and the vault agent gets a separate Task for signing. Works across machines and organizations. Overkill for three processes on one laptop.

Pattern 5 (Bridge IPC): All three servers call the main process via Unix socket. The search server writes its result to the bridge; the main process routes it to the memory handler; the vault handler signs without the key ever leaving the main process. Sub-millisecond, but only works on one machine.

No pattern is universally better. The choice depends on what you're optimizing for.

The Right Answer Depends on Your Constraint

There is no universal solution. Each pattern exists because of a different deployment constraint:

Constraint Best Pattern
Simple tool composition, LLM judgment needed between steps LLM-mediated
All tools in one domain, no modularity needed Single server
Enterprise deployment, existing infra (K8s, brokers) External coordination layer
Cross-organization, cross-runtime agents A2A protocol
Co-located processes, sub-ms latency, secret isolation Bridge IPC

The social media discourse of "CLI+Skill replaces MCP" maps to the first pattern: let the LLM figure it out. For the majority of use cases — calling existing CLI tools, chaining simple API calls — this is correct and sufficient.

But when you need cryptographic keys that must never enter the LLM's context window, or hour-long audit pipelines with state machines spanning hundreds of agent turns, or deterministic cross-extension resource sharing at millisecond latency — you need one of the other four patterns. The skill approach can't even represent these requirements, let alone solve them.

What I Think Happens Next

The protocol layer is consolidating fast. ACP merged into A2A. The Agentic AI Foundation houses both MCP and A2A under one governance body. The likely equilibrium:

The academic community's CA-MCP proposal (Shared Context Store) points toward a possible fourth layer — a data-plane coordination mechanism that sits between MCP servers without LLM mediation. But with the spec team moving toward stateless servers, this would need to emerge from the framework layer, not the protocol layer.

The bridge pattern — a single process mediating cross-extension communication via handler chains — will likely remain a niche solution for single-machine deployments. It's the right trade-off when your constraint is "zero dependencies, co-located processes, secret isolation." It's the wrong trade-off for anything distributed.

And that's fine. Not every solution needs to be universal. Sometimes a Unix socket and a JSON newline is exactly enough.


This analysis draws on open-source project documentation, academic papers, and protocol specifications. The key references are cited below. The infrastructure described in Pattern 5 (Bridge IPC) is open source: claude-ext.

Key references:

Model Context Protocol 已经发布一年多了。数千个 MCP 服务器存在。然而有一个如此根本性的缺口,以至于整个子生态系统都在围绕它寻找变通方案:MCP 服务器之间无法互相通信。

这不是疏忽,而是刻意的架构选择——理解其中的原因,能揭示 Agent 生态系统走向的更深层逻辑。

规范中的架构

MCP 定义了严格的客户端-服务器关系。宿主(LLM 应用)为每个服务器创建一个客户端。通信垂直流动:


宿主(LLM)
  ├── 客户端 1 ↔ 服务器 A
  ├── 客户端 2 ↔ 服务器 B
  └── 客户端 3 ↔ 服务器 C

服务器 A 无法调用服务器 B。没有水平通道。如果服务器 A 产生了服务器 B 需要的数据,它必须经过:A → 客户端 1 → 宿主(LLM)→ 客户端 2 → B。LLM 是唯一的中介。

这意味着每次跨服务器的数据传输:

对于简单的工具组合——搜索某样东西,然后存储结果——这没问题。LLM 读取搜索输出,自然地将其输入到存储调用中。

对于任何更复杂的场景,这就行不通了。

生态系统发明的五种模式

对这一缺口的应对聚集为五种模式:

1. 让 LLM 中介(接受成本)

项目: mcp-agent(LastMile AI)、大多数 OpenClaw 技能

最简单的方法:不对抗架构。LLM 从服务器 A 读取输出,决定调用服务器 B,传递数据。mcp-agent 在此基础上添加工作流模式(编排器、map-reduce、评估器-优化器)和可选的 Temporal 集成以实现持久化执行。

适用场景: 步骤之间确实需要 LLM 判断的工具组合。"搜索 X,然后分析结果并决定存储什么"——你希望 LLM 参与其中。

失效场景: 凭证传递(机密进入 LLM 上下文 = 安全灾难)、长时间运行的流水线(跨数百轮的数小时执行)、高频状态共享(亚秒级协调)。CA-MCP(Jayanti & Han,2026)测量了开销:标准 MCP 中每个工作流 5 次 LLM 调用 vs. 优化版本的 2 次——减少 60%,执行时间从 42 秒降至 13.5 秒。

2. 把所有东西放在一个服务器里(绕过问题)

项目: Agent-MCP、Network-AI

如果服务器之间无法通信,就不要有多个服务器。Agent-MCP 是一个单体 MCP 服务器,内部管理多个 LLM "Agent" 共享知识图谱。Network-AI 在一个服务器进程内实现互斥保护的共享黑板,使用提议-验证-提交语义。

适用场景: 所有工具自然属于同一个自包含领域。

失效场景: 当你需要模块化时。一个包含保险库 + 记忆 + 加密 + 审计的单体 MCP 服务器是工程噩梦。你失去了独立部署、独立故障隔离和独立开发。你重新发明了单体架构。

3. 添加外部协调层

项目: Solace Agent Mesh、SwarmClaw、Microsoft Agent Framework

引入 MCP 之外的东西来处理协调:消息代理(Solace)、任务板(SwarmClaw),或带类型消息传递的工作流引擎(Microsoft Agent Framework 的基于图的工作流,使用 BSP 超步)。

Microsoft 的方案最成熟:Agent 可以同时消费和暴露 MCP 服务器,通过工作流边和共享作用域状态通信,并在跨运行时场景中回退到 A2A。状态更新在超步边界排队并一致性地应用。

适用场景: 拥有现有基础设施(Kubernetes、消息代理、分布式状态存储)的企业部署。

失效场景: 当你需要零外部依赖时。Solace 需要代理。Microsoft Agent Framework 面向 Azure。Temporal(mcp-agent 使用)是另一个需要运维的分布式系统。

4. 在 MCP 旁定义新协议

项目: Google A2A(v1.0,2026 年 3 月)、ANP(W3C 社区组)

行业的正式答案:MCP 处理垂直关系(Agent→工具),另一个协议处理水平关系(Agent↔Agent)。

A2A 引入了 MCP 缺少的三个原语:Agent Cards(在 /.well-known/agent.json 进行能力发现)、Tasks(有生命周期的有状态工作单元:submitted→working→input-required→completed),以及 Messages(任务内的多部分通信)。input-required 状态至关重要——它实现了 Agent 之间的多轮协商,这在 MCP 的请求-响应模型中是不可能的。

ACP(IBM)于 2025 年 9 月合并入 A2A。ANP 位于两者之下,处理去中心化身份(W3C DIDs)和开放互联网的协议协商——A2A 面向企业,ANP 面向更广泛的 Agent 网络。

适用场景: 跨组织、跨运行时的 Agent 协作。

失效场景: 当服务器共处一台机器且需要亚毫秒延迟时。A2A 是 HTTP/JSON-RPC——完美适合分布式系统,但对共享文件系统的进程来说是杀鸡用牛刀。

5. 单进程桥接(直接 IPC)

项目: claude-ext(生产环境)、claude-ipc-mcp

主进程运行 Unix socket 服务器。MCP 服务器子进程连接到它。任何服务器都可以调用任何扩展注册的处理器——保险库、记忆、会话管理、其他扩展的能力——通过一个 socket 以行分隔 JSON 通信。亚毫秒延迟,零外部依赖,确定性路由。


MCP 服务器 A ──bridge.sock──► 主进程 ◄──bridge.sock── MCP 服务器 B
                                 │
                          处理器链
                    (vault, memory, crypto, team, ...)

处理器链模式意味着扩展独立注册各自的方法。服务器 A 调用 vault_get 时无需知道或导入保险库扩展——它只发送一个方法名字符串。主进程分派给第一个响应的处理器。

此外,dispatch() 让主进程本身也能调用同一个处理器链而无需经过 socket——使用同一个解耦接口实现进程内的扩展间协调。

适用场景: 所有 MCP 服务器共处一台机器的单机部署,特别是当你需要基础设施级别的资源共享(加密密钥、数据库连接、会话管理),且这些资源绝不能经过 LLM 上下文时。

失效场景: 分布式部署。单点故障。非标准(没有其他人实现这个协议)。

没人讨论的安全维度

跨服务器通信缺口有一个大多数讨论忽略的安全阴影。

学术研究(Zhao 等,2025;Sun 等,2025;Gaire 等,2025)记录了针对 MCP 的 12 个攻击类别,发现令人担忧:

令人不安的含义是:任何没有访问控制的共享上下文机制都会立即被利用。 CA-MCP 的共享上下文存储提案没有安全模型。LLM 中介模式本质上将所有跨服务器数据暴露给上下文窗口中的每个服务器。

桥接模式以不同方式解决了这个问题:主进程就是信任边界。它可以对处理器调用强制执行按扩展的访问策略。加密扩展对 vault_get 的桥接调用可以被允许,同时阻止博客扩展的相同调用。这不是纵深防御的表演——这是集中中介的真正架构优势。

为什么 MCP 不会添加服务器间通信

MCP 2026 路线图(由 Agentic AI Foundation 发布,成员包括 Anthropic、Google、Microsoft 和 OpenAI)有四个优先事项:传输演进、Agent 通信、治理、企业就绪。

没有一项包含跨服务器状态共享。

"Agent 通信"聚焦于 Tasks 原语(SEP-1686)——一个用于长时间运行操作的"现在调用/稍后获取"模式。传输演进面向无状态水平扩展和通过 .well-known 元数据进行服务器发现。规范正在明确走向无状态服务器

这与跨服务器共享状态的方向完全相反。这是刻意的:无状态服务器更容易扩展、更容易保护、更容易推理。MCP 规范团队选择不解决这个问题——他们把它留给上层(A2A)和下层(框架特定的 IPC)。

一个场景,五种模式

为了使比较具体化:Agent 需要搜索网络、将结果存入长期记忆,并用私钥签署一笔交易——三个 MCP 服务器(搜索、记忆、保险库)。

模式 1(LLM 中介): LLM 调用搜索服务器,读取输出,决定记住什么,调用记忆服务器,读取结果,决定签名,调用保险库服务器。三次 LLM 往返。LLM 看到搜索结果并可以在存储前摘要——有用。但私钥的签名参数也流经 LLM 的上下文窗口——危险。

模式 2(单服务器): 三种能力都在一个 MCP 服务器中。内部函数调用,无 LLM 中介。快速,但搜索库、数据库驱动和加密签名代码共享一个进程。搜索依赖中的漏洞可以触及密钥材料。

模式 3(外部协调): 工作流引擎编排序列。搜索结果进入消息队列,记忆服务器从队列读取,保险库服务器获得任务。健壮且可审计,但你现在需要在 Agent 旁边运维一个消息代理。

模式 4(A2A): 每个服务器是一个带 Agent Card 的 A2A Agent。搜索 Agent 创建 Task,记忆 Agent 接手,保险库 Agent 获得单独的签名 Task。跨机器和组织工作。对一台笔记本上的三个进程来说是杀鸡用牛刀。

模式 5(桥接 IPC): 三个服务器都通过 Unix socket 调用主进程。搜索服务器将结果写入桥接;主进程路由到记忆处理器;保险库处理器签名,密钥从未离开主进程。亚毫秒级,但只在单机上工作。

没有模式是普遍更好的。选择取决于你在优化什么。

正确答案取决于你的约束

没有通用解决方案。每种模式的存在是因为不同的部署约束:

约束 最佳模式
简单工具组合,步骤间需要 LLM 判断 LLM 中介
所有工具在一个领域,不需要模块化 单服务器
企业部署,现有基础设施(K8s、代理) 外部协调层
跨组织、跨运行时的 Agent A2A 协议
共处进程,亚毫秒延迟,机密隔离 桥接 IPC

社交媒体上"CLI+Skill 取代 MCP"的论调对应第一种模式:让 LLM 自己搞定。对于大多数用例——调用现有 CLI 工具、链接简单的 API 调用——这是正确且充分的。

但当你需要绝不能进入 LLM 上下文窗口的加密密钥、需要跨越数百个 Agent 轮次的数小时审计流水线和状态机、需要毫秒级延迟的确定性跨扩展资源共享时——你需要其他四种模式之一。技能方式甚至无法表达这些需求,更别说解决它们。

我认为接下来会发生什么

协议层正在快速整合。ACP 合并入 A2A。Agentic AI Foundation 将 MCP 和 A2A 纳入同一治理机构。可能的均衡态:

学术界的 CA-MCP 提案(共享上下文存储)指向可能的第四层——一个位于 MCP 服务器之间、无需 LLM 中介的数据面协调机制。但鉴于规范团队正在走向无状态服务器,这需要从框架层涌现,而不是协议层。

桥接模式——一个通过处理器链中介跨扩展通信的单进程——可能仍将是单机部署的小众解决方案。当你的约束是"零依赖、共处进程、机密隔离"时,它是正确的权衡。对于任何分布式场景,它是错误的权衡。

这没问题。不是每个解决方案都需要是通用的。有时候一个 Unix socket 和一个 JSON 换行符就完全够了。


本分析基于开源项目文档、学术论文和协议规范。主要参考文献如下。模式 5(桥接 IPC)中描述的基础设施已开源:claude-ext

主要参考文献: