主题对比 · AI Agent 抽象 —— 10 家深度横评

本文是「10 家 agent 架构主题对比」系列之一，专注「AI Agent 抽象本身」。同源数据基于 10 份单家架构文档（位于 agents/docs/）的第 1 / 2 / 3 章 + 第 8 章设计权衡章节，外加 agent-architectures-comparison.md 第 0–4 节。

关心 上下文管理 / 记忆 / 工具系统 / 评估 / Skill 加载 等其它维度的读者，请看本目录下其它主题文档。本文不重复"项目消歧"，10 家身份请翻同目录的总览文档。

第 1.4 节，其它家同样规则。

#零、为什么"AI Agent"本身值得单独对比

"一个 agent" 这五个字 10 家答得截然不同 —— 这是这次横评里最值得拉出来单独看的一层。

10 家产品都自称做 agent，但只要追问一句"那么一个 agent 在你这里物理上到底是什么"，答案就立刻分裂：APC 一行 SQL、SOUL.md 一份 markdown、StateGraph 一段 Python、 Profile + Subagent + AGENTS.md 三层叠加、甚至「没有 agent 实体，只是模式 × 规则 × 模型的笛卡尔积」。

身份载体的差异决定了实例化方式 / 生命周期 / 协作协议 / 自治档位 / 多端复用这一整串下游设计 —— 这就是为什么"AI Agent 抽象"值得作为一个独立主题先于其它一切讨论。

#一、10 家速览大表

一行一家，五列对位 —— 后面所有章节都以这张表为索引。

项目	身份载体	主循环范式	多 agent 协作模式	实例形态	自治档位
OpenClaw	Workspace 目录 + `SOUL/AGENTS/TOOLS/IDENTITY.md` 四件套 + `openclaw.json` 登记 `[openclaw §1.1–1.4]`	朴素 ReAct（无 plan-execute 外壳）	Run 同构：sub-agent 是新 session 的 Run	Gateway 单进程常驻；agent 在每 session 锁住串行	单档 ReAct + hook 拦截
Hermes	模型权重 + personality.md + SKILL.md + state.db `[hermes §1.1–1.4]`	ReAct + 模型层硬开关 reasoning（Hermes 4 `<think>`）	v0.13 起 Multi-agent Kanban：跨进程多 worker 协作同一耐久化任务板 `[hermes §9]`	单 Python 进程；多 profile 用目录隔离；Kanban worker 跨进程	单档 ReAct
Claude Code	`.md` 文件（subagent / skill / command 都是 markdown + frontmatter） `[cc §1.1–1.4]`	ReAct（gather → act → verify）+ plan mode	subagent（独立 ctx）+ agent team（mailbox，experimental）	CLI 进程 / 5 端复用 harness	5 道闸：hook → rule → mode (5 档) → protected paths → classifier
Codex	Profile + Subagent + AGENTS.md（四层叠加） `[codex §1.1–1.4]`	ReAct + sandbox × approval 双轴	subagent，显式 opt-in，`max_threads=6 / max_depth=1`	TUI + `codex app-server` 双进程，多端共享 server	双轴矩阵：3×4 = 12 种状态
Cursor	模式 × Rules × Agent Skills × Memories × 模型（无持久 agent 实体） `[cursor §1.1–1.5]`	ReAct + Agent / Ask / Custom modes；Plan / Debug 作为 Custom 化形态 + Composer 自家训	Subagents / BG agent（git worktree / 远端 VM）race pattern	一份 harness 5 个入口（Tab/Cmd+K/Composer/CLI/Cloud）	Tab → Cmd+K → Composer → Agent → Custom/Plan → BG（≈6 档）
Cline	模型 + system prompt + 20+ 件内置工具（早期约 13 件）+ `.clinerules/` 目录 `[cline §1.1–1.4]`	ReAct + Plan / Act 双模式（可不同模型）	无原生（`/newtask` 切上下文）	VSCode 扩展 / npm CLI；每 task 一个 Task 实例	Plan → Act → YOLO（3 档） + 工具级 Auto-approve 矩阵
OpenHands	Agent 子类（CodeActAgent 等）+ Microagents `[oh §1.1–1.4]`	CodeAct（让模型直接写 Python）+ AgentController step loop	Run 同构：Delegation（BrowsingAgent / LocAgent）	server 多端复用 EventStream，runtime 6 种后端	`confirmation_mode` + EventStream replay
AutoGen	`BaseChatAgent` 子类 + system_message + tools + termination `[autogen §1.1–1.4]`	单 agent ReAct + 5 种 GroupChat selector	Actor 同构：RoundRobin / Selector / Swarm / Graph / Magentic	actor 异步消息驱动；可跨进程跨语言（gRPC）	由 termination_condition 而非 mode 控制
LangGraph	StateGraph + Node 函数 + state schema `[lg §1.1–1.5]`	图编排（Pregel BSP）+ create_react_agent prebuilt	Node 同构：Supervisor / Swarm / GraphFlow / Deep Agents	嵌入用户进程 / Platform 长任务 worker	`interrupt()` / `interrupt_before` / human-in-loop 节点

观察：列三和列四的耦合度远高于直觉。身份是"代码"的两家（AutoGen、LangGraph）必然走 actor / 节点同构；身份是 markdown 的 7 家几乎一定走 Run 同构或工具同构； tool_config.fastagent_config IS NOT NULL 就是 FA）。

#二、Agent 身份的三种载体哲学

身份载体一旦定下来，下游 6 个设计点几乎不可逆。

#2.1 三种载体 + 各自的连带后果

载体类型           代表                                 连带后果
──────────────────────────────────────────────────────────────────────────
                  （APC = scenario_pack 表 + S3）         ➜ "工具 = 子 agent" 可入表
                                                        ➜ 必须有 dispatcher 路由
                                                        ➜ 用户量级一定大、商业模型一定 SaaS

"Markdown 文件"    OpenClaw (SOUL/SKILL.md)              ➜ 走 git，PR review 是天然能力
                  Hermes (SKILL.md / MEMORY.md)          ➜ 人写 prompt 跟 agent 调 prompt 同载体
                  Claude Code (.md + frontmatter)        ➜ 可发布 / 可分享 / Hub 生态
                  Codex (AGENTS.md)                      ➜ "agent 自己学新能力"被天然抑制
                  Cursor (.cursor/rules/*.mdc)              （Codex 干脆禁止 LLM 改 AGENTS.md）
                  Cline (.clinerules/)                   ➜ 多端复用配置最容易做
                  OpenHands (.openhands/microagents/)

"代码 / Lib"       AutoGen (Python class)                ➜ 类型安全 + IDE 跳转
                  LangGraph (StateGraph)                  ➜ 没有 GUI 配置面 / 无中心后台
                                                        ➜ 多 agent 是一等公民
                                                        ➜ 用户必须懂代码

#2.2 七家押宝 markdown 是这次横评最大的趋势

10 家里 7 家把"身份"落到 .md。这个数字解读起来比看上去重要：

markdown 既是给人写的，也是给模型读的 —— 撰写 prompt 和维护 agent 合二为一。
走 git —— 改一条 rule 等于一次 PR，agent 行为变化天然 reviewable。
可发布 / 可交换 —— OpenClaw 做了 ClawHub 卖 SOUL，Anthropic 推出 agentskills.io 开放标准，Hermes 兼容，OpenHands v1 SDK 路径迁到 .agents/skills/。
抗自动学坏 —— Codex 明确规定 AGENTS.md 不让 LLM 写 [codex §8.4]，"昨天偶然说了句用 tabs，今天 Codex 就把全项目改 tabs" 不会发生。

引语："open-spec markdown is winning over private DB-backed memory."

#2.3 反例：Cursor 把"无身份"当特性

Cursor 是 10 家里最反直觉的 —— 它没有持久化的 agent 对象：

本次会话的 Agent 行为
  = 选中的 Mode (Agent/Ask/Custom；Plan/Debug 可视为 Custom 化形态)
  + 命中的 Rules
  + 可用 Agent Skills (`SKILL.md`)
  + 自动召回的 Memories
  + Codebase 索引
  + 当前选中的 Model
  + @ 引用注入的临时上下文

这五维笛卡尔积才是"我现在用的 agent" [cursor §1.1]。直接后果：你不能像调用 API 那样把 Cursor agent 复用到 CI —— 这正是 Cursor CLI / Cloud Agents 要补的洞。身份的"无" 也是一种设计，但代价是必须自己造 CLI / Cloud 才能让"agent" 离开 IDE。

#2.4 谁能"自己学新能力"

身份载体哲学	自己造工具	patch 自己的能力	自己改长期记忆
Markdown（除 Hermes）	✗（人写 .md）	✗	部分（Cursor Memories / Claude Code auto memory / OpenHands Condenser）
Markdown（Hermes）	✓ 5+ tool_call 自动造 SKILL	✓ skill 用错时 patch	✓ MEMORY.md curate
Code（AutoGen / LangGraph）	✗（lib 不管这层）	✗	✗

Hermes 是 10 家里唯一把"自进化"做成 first-class 的 —— 这是模型公司做 agent 的天然优势：用户 trace 是下一代训练数据 [hermes §1.3]。其它家会有意识地避免"agent 自动学坏"，但模型公司有动机鼓励 agent 自动产出可训练样本。

#三、主循环范式的 7 种取舍

"主循环都是 ReAct" —— 这是个错觉。把 10 家拆开看，至少 7 种范式在并行演化。

#3.1 七种范式速查

范式	代表	核心机制	取舍
朴素 ReAct	OpenClaw, Claude Code（默认）	单一 LLM 调用决定 reason + tool call → observe → 再 reason	简单、依赖模型能力；OpenClaw 文档原话 "reasoning, action selection, and stopping —— inside one model invocation per turn" `[openclaw §3.1]`
ReAct + 多档自治	Cline (Plan/Act/YOLO), Cursor (Tab/Cmd+K/Composer/Agent/Custom/BG), Codex (sandbox×approval)	用户切档调 agent 自治度	UX 复杂度上升换用户掌控感
CodeAct	OpenHands	LLM 直接写 Python 作为 action，IPython kernel 跑	单轮抵 N 轮 tool_call；parser 必须 robust，sandbox 必须强 `[oh §3.4 §8.1]`
图编排（StateGraph）	LangGraph	显式 Node + Edge + state schema + 条件边	"控制流移到了图层而不是 LLM 内部" `[lg §3.2]`；可重放、可中断、可观测
Actor + GroupChat	AutoGen	每个 agent 是 actor，message-driven，5 种 selector	跨进程 / 跨语言天然可扩；token 烧得快 `[autogen §3]`
模型层硬开关	Hermes 4	`<think>` 训进权重，runtime 不哄	模型公司专享，runtime 立刻变薄；vLLM tool-call-parser hermes 仍有并发 bug `[hermes §3.2 §3.3]`

#3.2 一个意外的反差：是不是 ReAct？

"严格 ReAct"（reason → act → observe → 再来）
  Claude Code (默认)            ← gather → act → verify 三段式 [cc §3.1]
  OpenClaw                      ← 显式声明 ReAct-style 单 turn
  Cline                         ← recursivelyMakeClineRequests
  Codex (local)                 ← Responses API tool 序列
  Hermes (runtime)              ← run_conversation 同步循环

"非 ReAct"
  OpenHands CodeAct             ← prompt-only XML 标签，模型直接写 Python
  LangGraph 自定义 graph        ← while True 移到图节点
  AutoGen actor                 ← 消息总线 + 5 种 selector
  Hermes 4 模型层 <think>        ← reasoning 训进权重，框架不再哄

"ReAct 加多档自治"
  Cline (Plan / Act / YOLO)
  Cursor (Tab / Cmd+K / Composer / Agent / Plan / BG)
  Codex (sandbox × approval)

引语："控制流移到 graph / model / sandbox 的程度，决定了 harness 有多薄。"

#3.3 为什么"多档自治"是 IDE / coding agent 的共同选择？

Cline、Cursor、Codex 三家都把"自治度"做成显式档位。这不是巧合，而是 coding agent 的天然约束：

                    autonomous loop
  inline 补全  ─────────────────────────► PR 自动出
  ↑（用户掌控感强）                        ↑（用户掌控感弱）
  Tab / ghost text                       BG agent / Cloud / Resolver

档位本质上是 prompt 模板 + 工具白名单 + 终止条件 + approval 策略 的组合：

Cline Plan Mode：禁所有写工具 + plan_mode_respond + ask_followup_question [cline §3.1]
Cursor Plan：只读 + 写 plan 文件 + 等用户 confirm [cursor §3.1]
Codex --full-auto：workspace-write + on-request（不是单一 enum） [codex §3.4]

chat 式的，不是 IDE 共编辑那种共同操作同一份代码的高 stakes 场景，所以单档够用。

#3.4 CodeAct vs tool_call：组合性的胜利

OpenHands 是 10 家里唯一主推 prompt-only 协议的。论文测试 CodeAct 比 JSON tool_call 在 SWE-bench 上多 ~10pp 通过率 [oh §8.1]。原因不是模型偏好代码，而是组合性：

任务："读完 JSON，把 status 不是 200 的全 retry"

tool_call 版（约 4–6 个 round-trip）：
  → http_get → parse_json → filter (LLM in head) → http_post × N → ...

CodeAct 版（1 turn）：
  <execute_ipython>
  data = http_get(url).json()
  for item in data:
      if item["status"] != 200:
          http_post(retry_url, item)
  </execute_ipython>

代价是 parser 必须能容忍 XML 嵌套 / 转义陷阱 + sandbox 必须够强（IPython kernel 在 Docker / Local / RemoteRuntime 三种原生实现里跑；E2B / Daytona / Modal / Runloop 2025 年中起统一退到 RemoteRuntime 协议接入）[oh §4.2 §8.3]。OpenHands 的回答是把 runtime 抽象成 HTTP 协议（action_execution_server），让具体实现以独立包形式存在 [oh §8.3]。

#3.5 模型层硬开关：能力下沉的极致

Hermes 4 的 <think> 标签训进权重：

框架时代（Hermes 3 之前）        模型层时代（Hermes 4）
───────────────────────         ───────────────────────
prompt: "Think step by step      模型权重已学会：难题就长 <think>
         then answer"             简单题就直接答
                                  
runtime: 解析 CoT、决定要不要      runtime: 把 <think> 塞进 reasoning_content
         拼回去                            字段，不哄不引导

工程意义在于 runtime 立刻变薄：vLLM --tool-call-parser hermes 在引擎里就把 <tool_call> / <think> 解析好，runtime 永远只跟 OpenAI 兼容协议打交道 [hermes §3.2 §7]。这是模型公司做 agent 的天然杠杆 —— 别人在 prompt 里反复哄模型 think，Hermes 把这件事一次性下沉到训练 pipeline。

#四、多 Agent 协作的四种"同构"哲学

"父 agent 调子 agent" 这句话，10 家里至少有 4 种不同的物理含义。

#4.1 四种同构

  ┌──────────┐
  │ System   │ ─── tool_call(content_verify) ─── 看起来跟 read_file 一样
  │   Agent  │                                    │
  └──────────┘                                    ▼
                                          ┌───────────────────┐
                                          │ FastAgent           │
                                          │   ✓ 独立沙箱         │
                                          │   ✓ 独立 LLM 实例    │
                                          │   ✓ 完整 ReAct 循环  │
                                          └───────────────────┘
  → 一次 tool_call 可能 = 30 次 LLM
  → 上层框架代码极简：父 agent 不需要"子 agent 协议"

Run 同构（OpenClaw / OpenHands）
  ┌──────────┐    /subagents spawn ops "..."     ┌──────────┐
  │ 父 Run   │ ───────────────────────────────▶│ 子 Run   │
  │ session  │                                   │ session  │
  └──────────┘    完成时 announce                 └──────────┘
       ▲                                              │
       └── 子 transcript 路径 + summary ───────────────┘
  → 子 agent 是新的 session.jsonl + 独立 lane lock     [openclaw §3.3]
  → OpenHands 的 AgentDelegateAction 是 typed event，
    子 controller 跑独立 EventStream                   [oh §3.5]
  → 调试直观（独立 transcript），代价是冷启动
  → 默认 isolated context（OpenClaw "context: isolated"）

Actor 同构（AutoGen）
  ┌────────────┐ publish_message  ┌────────────┐
  │ Agent A    │ ──────────────▶ │ Runtime    │ ──── 按 subscription 派发
  │ (actor)    │                  │ Queue      │           │
  └────────────┘                  └────────────┘           ▼
                                                       ┌──────────────┐
                                                       │ Agent B      │
                                                       │ Agent C      │
                                                       └──────────────┘
  → 没有同步调用栈，全部异步消息                         [autogen §2.1–2.3]
  → 天然跨进程跨语言（GrpcWorkerAgentRuntime）
  → 代价是协议必须 protobuf schema 化

Node 同构（LangGraph）
  ┌─────┐   conditional edge    ┌─────┐
  │  A  │ ────────────────────▶│  B  │
  └─────┘  Send / Command       └─────┘
            （动态扇出）            │
                                   ▼
                              ┌──────────┐
                              │ subgraph │  ← Supervisor / Swarm 都是 subgraph
                              └──────────┘
  → agent 之间通过图边显式通信                          [lg §1.4 §3.3 §3.4]
  → 调度逻辑全显式，token 也全显式
  → 多 agent 是一等公民

#4.2 四种同构的 trade-off 矩阵

|---|---|---|---|---| | spawn 成本 | tool_call 一次 | 起新 session | publish 消息 | 进入 subgraph | | 隔离粒度 | 沙箱 / LLM 实例 | session lane / lane lock | actor + topic | state subset | | 调试直观度 | 中（trace 进父 ctx） | 高（独立 transcript） | 中（actor msg 序列） | 高（state checkpoint） | | token 暴露度 | 最少（只回 summary） | 少（announce summary） | 多（每 actor 一份 ctx） | 中（reducer 合并） | | 并发 spawn | asyncio.gather | session 并行（解父锁） | 天然并发 | Send 动态扇出 | | 典型上层 framework 代码量 | 几乎零 | 一组 slash 命令 | Team 类 + selector | 几十行 graph |

引语："四种同构选哪个，本质是在问：我希望 multi-agent 是*协议、进程、actor 还是状态？"*

#4.3 spawn 方式的四种风格

隐式 tool_call               显式 spawn 命令          message-driven           图边
───────────────              ──────────────           ────────────             ──────
Codex spawn_agents_*        /subagents spawn          Team(participants=)      add_node + Send
Claude Code Task tool        Cursor BG agent           handoff tool             create_supervisor
                             OpenHands                                          create_swarm
                             AgentDelegateAction

值得注意的是 OpenClaw / Codex / Claude Code 三家都把"显式 opt-in"当默认：

Codex max_threads = 6 / max_depth = 1，要 prompt 里说 "spawn one agent per ..." 才动 [codex §1.3 §8.6]
Claude Code v2.1.88 把"subagent 不能嵌套"细化为：普通 subagent 仍可异步 spawn 后台 subagent；但 fork-child 不能再 fork、teammate 不能再 spawn teammate —— 递归防护落在 fork / team 这两条容易爆开的路径上，不再是"一刀切禁止嵌套" [cc §3.3]
OpenClaw subagent 必须用 /subagents spawn 或模型主动调 sessions_spawn() 工具

这是有原因的 —— agent-architectures-comparison.md §11.10 的总结：

"先用单 agent + 好 prompt 跑通，再决定要不要拆。multi-agent 不是越多越好。"

OpenHands [oh §8.6] 也直说：multi-agent delegation 当且仅当 sub-task 的 observation space 跟主任务差异大时才值得拆（浏览器 axtree vs 代码就是差异大的典型）。

#4.4 一个特例：Claude Code 的 agent team mailbox

Claude Code 在 v2 实验性引入了 agent team（需 CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1）：

                ┌────────────────────────────┐
                │  Team Lead session         │
                │  ✓ spawn teammates         │
                │  ✓ 综合结论                │
                └─────┬───────────┬──────────┘
                      │           │
           ┌──────────┘           └──────────┐
           ▼                                 ▼
┌──────────────────────┐          ┌──────────────────────┐
│ Teammate A           │  mailbox │ Teammate B           │
│ 完整独立 Claude Code │◀────────▶│ 完整独立 Claude Code │
│ session              │SendMessage│ session              │
└──────────────────────┘          └──────────────────────┘
                ▲                                 ▲
                └──────────共享 task list──────────┘
                  （文件锁，避免抢任务）

这是 Claude Code 同时使用了两种同构的尝试：subagent 是 Run 同构（一次性回执）， agent team 是 actor 同构（teammates 之间能直接 SendMessage）[cc §3.4]。代价是 token 成本爆炸（每个 teammate 都是完整 Claude Code 实例），所以仍在 experimental。

#五、自治度光谱

从 ghost text 到 PR 自动出，agent 的"放手程度"是个连续体而非离散选项。

#5.1 光谱总览

inline 补全  →  快捷编辑   →   plan-act      →   autonomous   →   multi-agent
                                                                    或 background
─────────────────────────────────────────────────────────────────────────────────
 Cursor Tab    Cursor Cmd+K   Cline Plan/Act   Claude Code 默认   Cursor BG agent
               Cursor Apply    Cursor Plan      Codex full-auto   Codex Cloud
 Composer 内联              Cursor Composer    Hermes runtime    OpenHands Cloud
                                                Cline YOLO         Claude Code agent team
─────────────────────────────────────────────────────────────────────────────────
 用户每个动作都点头         用户阶段性确认        agent 自驱跑完     多 agent 异步并行

光谱左侧的极致是 Cursor Tab（按一下 Tab 就 ghost text，没有任何 agent loop），右侧极致是 Codex Cloud / OpenHands Cloud / Cursor BG（远端 VM，几小时后回 PR）。

#5.2 双轴 vs 单轴：Codex 的代际超越

把"自治度"拆成 能做什么（sandbox） × 何时打断人（approval）两轴是 Codex 的招牌 [codex §3.2–3.4]：

                     │ approval=never │ on-request │ untrusted
─────────────────────┼────────────────┼────────────┼──────────
sandbox=read-only    │ 静默只读       │ 越界询问   │ 写命令必问
sandbox=workspace-w. │ ★ 真 yolo'lite │ ★★ 默认推荐 │ 复杂场景
sandbox=danger-full  │ ★ 真 yolo      │ 命令外问   │ 命令外问

3 × 4 = 12 种状态，覆盖了从"review-only"到"yolo"的全光谱。--full-auto 等价于 workspace-write + on-request，是日常推荐。--yolo 别名 --dangerously-bypass-approvals-and-sandbox —— 名字起得这么长就是劝退 [codex §3.4]。

引语："永远不要把'能力'和'审批'绑成一个 enum。" [comparison §11.4]

#5.3 Claude Code 的 5 道闸

Claude Code 没有双轴矩阵，但堆了 5 层串行检查[cc §2.2]：

模型决定调 Edit
   │
   ▼  ① PreToolUse hook
   │    settings.json hooks.PreToolUse[]  → allow/deny/ask/defer
   │
   ▼  ② permission rule
   │    deny → ask → allow，先匹配先赢
   │
   ▼  ③ permission mode（5 档）
   │    default / acceptEdits / plan / auto / dontAsk / bypassPermissions
   │
   ▼  ④ protected paths（永远拦截）
   │    .git/.claude/.vscode/.idea/.husky
   │
   ▼  ⑤ classifier（auto mode 下）
   │    server-side prompt injection 扫描
   ▼
执行工具

这套设计的重点是 每一道闸都可以单独配置：hook 可以挂 lint / format，rule 可以走 glob，mode 跟着用户选 ux 档位走，protected paths 是兜底，classifier 是云端默认开。比起 Codex 双轴，Claude Code 路是"5 个旋钮"代替"2 个旋钮"，权限粒度最细，但学习成本也最高。

#5.4 Plan-Act 双模式的发明者：Cline

Cline 是 10 家里把 "想清楚"和"动手做" 拆成两档自治度的代表 [cline §3.1–3.2]：

维度	Plan Mode	Act Mode
能做	读 / ripgrep / 问澄清 / 聊方案	写 / 跑命令 / 浏览器 / MCP
不能做	任何写操作（`strictPlanModeEnabled` 强制阻断）	——
典型工具	`read_file` / `search_files` / `plan_mode_respond`	全部 13 件
可独立配模型	是	是

经典玩法：Plan 用 Opus 想透 → Act 用 Sonnet 快速落地，省 70% 成本。切档不清空 history —— Plan 阶段读出来的所有上下文会保留进 Act 阶段，"双模式真正的价值不是重新开始，是换一副工具"[cline §3.2]。

#5.5 自治档位的"反模式"

comparison §11.10 指出：很多 agent 框架（CrewAI / AutoGen 早期）默认让 supervisor 自动 spawn workers，但实际工作流调试极难、token 烧得飞快。默认开放最高自治度是反模式。值得抄走的设计：

Codex max_threads=6 / max_depth=1 —— 上限保守
Claude Code 把递归防护精准下沉到 fork-child / teammate 两条最易爆开的路径（普通 subagent 仍可嵌套，但 fork 不能再 fork、teammate 不能再 spawn teammate）[cc §3.3]
OpenHands stuck.py 检测循环模式（同 action 重复 N 次）→ 强制 delegation 或 abort

#六、并发 vs 一致性的取舍

"并行得越凶，一致性的责任越重" —— 10 家里这条铁律体现得淋漓尽致。

#6.1 并发策略 + 一致性保护对照

项目	默认并发策略	一致性保护机制	物理载体
OpenClaw	强串行：session lane lock	append-only `session.jsonl` + 文件锁	单进程 + fs
Hermes	ThreadPoolExecutor max=8，路径感知（同路径串行）	SQLite `BEGIN IMMEDIATE` + retry	state.db
Claude Code	sub-agent 并行；main 内串行	file lock；agent team 共享 task list 用文件锁 claim	本地 fs
Codex	sub-agent 并行（max=6）；本地 sandbox 串行；`multi_tool_use.parallel` 批读	git diff 持续可见	local + cloud sandbox
Cursor	BG agent 并行（git worktree / VM）；race pattern 多模型同跑	用户最后挑一个合并	worktree + 远端 VM
Cline	单线 ReAct，每步 approval	影子 git per-step commit	`.cline/checkpoints/`
OpenHands	单 agent；delegation 短串行	EventStream deterministic replay	event log
AutoGen	actor 异步消息驱动；天然并发	显式 message envelope；GraphFlow 走 DAG	actor runtime
LangGraph	Pregel BSP 超步并行	reducer 合并；checkpointer 切片	checkpointer + Store

#6.2 五种"以日志为真"的一致性手段

1. Append-only 转写（OpenClaw session.jsonl）
   → 写入受 session lane lock 保护，其它 Run 排队
   → 进程重启后 lane 状态从 fs 恢复
   → 牺牲并发换一致性 [openclaw §3.2]

2. Event-sourced + deterministic replay（OpenHands）
   → 所有 Action / Observation 都是 typed event
   → state 完全由 event log 决定
   → 给定 log 重放出的 state 一定一致
   → v1 SDK 把这个做成 first-class [oh §8.2]

3. Pregel BSP + Reducer（LangGraph）
   → 每个 superstep 收集所有更新再批量合并
   → reducer 把"多个 node 并行写同一字段"变成"先汇集再 merge"
   → checkpointer 在每 superstep 末序列化状态 [lg §1.2 §2.1]

4. 影子 git per-step commit（Cline）
   → 每个工具执行完，影子 git 自动 commit
   → 任何回滚都是 git revert
   → checkpoint 三粒度（task / message / tool） [cline §3.3]

   → backend 先回 echo 让前端 UI 立即反馈
   → courier 收齐 finish_chunk 才落库

#6.3 反范式提醒

引语："Run 串行不是落后，是对一致性的尊重。" [comparison §11.5]

OpenClaw 做了个"明示选择"：不在主 loop 里搞并发，sub-agent 是唯一的并发逃生口 [openclaw §3.3]。这条路看上去保守，但**"transcript 是 agent 记忆的根基，并发写就乱套"** 这个判断在多 agent 时代几乎是颠扑不破的。

反过来，同时追求并发和一致性必须靠重型机制 —— Pregel BSP（LangGraph）、event-sourced

#七、最值得抄走的 5 个 Agent 抽象创新

跨 10 家挑出的具有"代际意义"的设计，不是性能优化或 UX 雕花，而是改变了人们对 "agent 是什么"的回答。

抄什么：把 sub-agent 包装成 tool，让父 agent 调子 agent 跟调 read_file 没区别。

为什么是创新：彻底取消了"多 agent 协议"这一层。新加一种 sub-agent 不需要改主循环、不需要改流式协议、不需要新的 spawn 机制 —— 它就是个新工具。这是 10 家里上层框架代码最少的多 agent 实现。

代价：父 agent 看不到子 agent 的中间过程；调试需要单独翻 FA 的 trace。

#7.2 双轴自治 —— Codex sandbox × approval

抄什么：把"能做什么"（sandbox）和"何时问人"（approval）解耦成两个独立 enum [codex §3.4 §8.2]。

为什么是创新：单轴 trust level 在 review / 默认开发 / yolo 三档之间总是丢东西（"我想 read 全盘但不想动手 push" 单轴写不出来）。双轴 12 种状态正好覆盖现实场景。 comparison §11.4 把它列为"代际超越"。

代价：用户要理解两个概念，新手心智负担略高。

#7.3 CodeAct —— OpenHands

抄什么：当任务有强组合性时，让 LLM 直接写 Python 代码作为 action，整段丢进 IPython kernel [oh §3.4 §8.1]。

为什么是创新：tool_call 协议是"模型最不熟的语言（JSON schema）"，CodeAct 用 "模型最熟的语言（代码）"。SWE-bench 实测 +10pp 通过率 —— 不是 prompt 调优能换来的差距。

代价：parser 必须能容忍 XML 嵌套 / 转义；sandbox 必须强（OpenHands 的 Runtime 抽象 + ActionExecutionServer 统一 HTTP 协议是为 CodeAct 服务的，原生 Docker / Local / RemoteRuntime 三种；E2B / Daytona / Modal / Runloop 经 RemoteRuntime 接入）。

#7.4 模型层硬开关 reasoning —— Hermes 4

抄什么：把 <think> 训进权重，runtime 不哄不引导，vLLM tool-call-parser 直接在推理引擎层把 <think> / <tool_call> 解析成 reasoning_content / tool_calls [hermes §3.2 §7]。

为什么是创新：能力下沉到模型层让 harness 立刻变薄。comparison §11.2 总结： 能下沉到权重的能力，就不要在 prompt 里反复哄。Cursor 的 Composer-Model 走的是同一条路（自家训中端模型让"调自家工具"成为权重级技能）。

代价：模型公司专享。runtime 团队要换模型时这条优势就没了。

#7.5 EventStream 单一总线 —— OpenHands

抄什么：所有 Action / Observation 都是 typed event，所有组件（agent / runtime / UI / resolver / telemetry）都是 EventStream 的 publisher 或 subscriber [oh §3.2 §3.3 §8.2]。

为什么是创新：

多端复用：CLI / WebUI / GitHub Action / Cloud 四种入口都是 stream 的不同 driver， agent 逻辑零变更。
deterministic replay：state 完全由 event log 重建，给定 log 跑出的 state 一定一致。
新端只是 subscriber，加 telemetry / 加权限审查 / 加 A/B 都不动核心。

comparison §11.11 把这个总结为：*"凡是要做多端 / 多 surface / 调试 replay 的，最后同一思路的不同实现。

代价：工程纪律要求高 —— 副作用必须 emit 成 event 才"算数"，agent 不能在 step 内偷偷改外部状态而不留 event。

#八、趋势观察 + 一句话各家总结

2026 年 4 月这个时间点，10 家答卷里看到的 4 个明显趋势 + 10 句一句话总结。

#8.1 四个明显趋势

趋势 1：身份载体收敛到 markdown 10 家里 7 家选 markdown。新做 agent 产品至少应兼容 AGENTS.md / CLAUDE.md 这个文件名约定 走 git review、走开放生态几乎是不可逆的方向。

趋势 2：能力下沉到模型层 Hermes 4 <think> 训权重、Cursor Composer-Model 训自家工具调用、OpenAI Responses API 原生支持 reasoning + tools —— harness 在变薄。模型公司有结构性优势，runtime 团队未来要么也下场训模型，要么找模型不好做的层（多 agent 协调 / 沙箱 / 多端）下沉。

趋势 3：多 agent 是 opt-in，不是默认 Codex max_threads=6 / max_depth=1、Claude Code 在 fork / teammate 路径硬阻断递归、OpenHands 主推单 agent CodeAct、AutoGen 团队 8.6 节直说"先别上 multi-agent"。默认堆 hierarchical 是反模式，多 agent 只在 sub-task 的 observation space 跟主任务差异大时才值得拆 [oh §8.6] [comparison §11.10]。

趋势 4：双轴 / 多档自治取代单轴 trust Codex sandbox × approval、Cline Plan/Act/YOLO、Cursor 6 档、Claude Code 5 道闸 —— 单一 trust enum 已经被淘汰。复杂权限系统的共识：永远不要把"能力"和"审批"绑成一个 enum[comparison §11.4]。

#8.2 一句话各家总结（仅限"agent 抽象"维度）

项目	一句话
OpenClaw	用 session lane lock 公开声明"牺牲并发换一致性"；SOUL.md 把人格做成可分发的最小乐高块。
Hermes	唯一把"agent 自进化"训进闭环的；模型权重 + SKILL.md + MEMORY.md 三件套，用户 trace 是下一代训练数据。
Claude Code	配置即身份的"无中心 agent"；subagent / skill / command 全是 markdown，5 道闸是单档 ReAct 时代的最佳实践。
Codex	sandbox × approval 双轴是这次横评里最值得抄走的权限设计；Profile + Subagent + AGENTS.md 四层叠加把"运行时人格"工程化到极致。
Cursor	没有持久化 agent 实体，用模式 × Rules × Agent Skills × Memories × 模型的笛卡尔积代替 —— 反直觉但 IDE 场景最自然；harness 才是产品。
Cline	Plan/Act 双模式的发明者；影子 git per-step commit 把"agent 回滚"做成本地默认能力。
OpenHands	CodeAct 让模型直接写 Python；EventStream 单一总线 + deterministic replay 是多端复用的最优解。
AutoGen	actor 同构 + 5 种 GroupChat selector；天然跨进程跨语言，但 token 烧得快、要靠工程纪律。
LangGraph	把"控制流"显式建模成图节点 + 条件边；Pregel BSP + Reducer 是并发与一致性兼得的最重型方案。

#九、附录：本文重复使用的 4 张纵向对照表

放最后方便快速回顾。

#9.1 身份载体 × 实例形态 × 长生命周期

项目	身份载体物理形态	实例长生命周期？	复用机制
OpenClaw	Workspace 目录 + json	部分（Gateway 进程常驻，agent 在 lane 内串行）	session 命名 + lane lock
Hermes	`~/.hermes/` 目录 + state.db	✗ 单 turn 进程	profile 目录隔离；CLI / TUI / Gateway / Cron / ACP 共享同一份 AIAgent `[hermes §2.2]`
Claude Code	`.claude/agents/*.md`	单 turn / agent team experimental	harness 进程；agent team 跨 session 共享 task list
Codex	`.codex/agents/*.toml` + AGENTS.md	✓ `codex app-server` 后台进程	TUI + IDE 插件共享同一 server，开多个前端不重复登录 `[codex §2.1]`
Cursor	`.cursor/rules/*.mdc` + `SKILL.md` + Memories	✗（每次 Cmd+L 临时拼）	5 入口共享同一 harness（Tab/Cmd+K/Composer/CLI/Cloud）
Cline	`.clinerules/` + `~/.cline/data/`	task 级（一个 task 一个 Task 实例）	跨 task 走 StateManager + checkpoint
OpenHands	Python class + microagents	server 多端复用	AgentSession + EventStream subscriber
AutoGen	Python class / YAML	actor lazy 创建	runtime `agent_factories`
LangGraph	Python StateGraph	thread_id 切 checkpoint	checkpointer + Store + Platform task queue

#9.2 主循环范式 × 多 agent 同构

项目	主循环	多 agent 同构
OpenClaw	朴素 ReAct	Run 同构（sub-session）
Hermes	ReAct + `<think>` 模型层硬开关 + `/goal` Ralph loop	Kanban worker 跨进程协作（v0.13 新增）
Claude Code	ReAct（gather → act → verify）+ plan mode	Run 同构（subagent）+ Actor 同构（agent team）
Codex	ReAct + sandbox×approval 双轴	Run 同构（subagent，显式 opt-in）
Cursor	ReAct + Agent / Ask / Custom modes	Run 同构（BG agent worktree / VM）
Cline	ReAct + Plan/Act 双模式	无
OpenHands	CodeAct + AgentController step loop	Run 同构（Delegation）
AutoGen	单 agent ReAct + 5 种 GroupChat	Actor 同构
LangGraph	图编排（Pregel BSP）+ create_react_agent	Node 同构（Supervisor / Swarm / GraphFlow）

#9.3 自治档位

项目	档位 / 机制	档数
OpenClaw	单档 ReAct + hook 拦截	1
Hermes	单档 ReAct	1
Claude Code	hook → rule → mode (5 档) → protected paths → classifier	5 道闸
Codex	sandbox (3) × approval (4) = 12 状态	2 维 12 状态
Cursor	Tab / Cmd+K / Composer / Agent / Custom/Plan / BG	~6
Cline	Plan / Act / YOLO + 工具级 Auto-approve	3 + 矩阵
OpenHands	confirmation_mode + EventStream replay	2
AutoGen	termination_condition（11 种 + 组合）	由 termination 控制
LangGraph	interrupt() + interrupt_before / human-in-loop 节点	由图节点控制

#9.4 一致性保护手段

手段	代表	物理载体
Append-only transcript + lane lock	OpenClaw	`session.jsonl` + 文件锁
Event-sourced + deterministic replay	OpenHands v1 SDK	event log
Pregel BSP + Reducer	LangGraph	checkpointer
影子 git per-step commit	Cline	`.cline/checkpoints/`
SQLite `BEGIN IMMEDIATE` + retry	Hermes	state.db
AGENTS.md 写死"prompt cache 完整性"	Hermes	硬规则
File lock + 共享 task list	Claude Code agent team	本地 fs

数据来源：10 份单家架构文档（每份 800–1500+ 行）+ 综合对比文档，均位于 agents/docs/，2026-04 调研。

本文最不确定的两点：

Cursor 的实例化模型只能通过第三方观察推断（无开源 harness）；上文写的"无持久 agent 实体" 是基于官方 docs + Forum 推论，Cursor 内部可能在后端有 session-scoped 对象。

Hermes 4 的 <think> 训进权重的具体程度 没有完整训练 recipe 公开；模型卡明示了切换方式，但"runtime 不哄"的程度依赖各家推理引擎的 tool-call-parser 实现， vLLM 的 hermes parser 在并发 / 流式下仍有未关闭的 issue（#31871、#34932）。