This paper proposes externalization—relocating cognitive burdens into persistent external structures—as the unifying logic behind LLM agent advances. Memory converts recall to retrieval, skills convert generation to composition, and protocols convert ad-hoc interaction to structured exchange, coord…
核心问题
What transition logic unifies recent advances in LLM agents across memory, skills, protocols, and harness engineering?
核心方法
{'approach': "The paper provides a systems-level review synthesizing existing work on LLM agents through the lens of cognitive externalization. It draws on cognitive science frameworks including Norman's cognitive artifacts, Kirsh's complementary strategies, and Hutchins' distributed cognition to analyze how memory, skills, protocols, and harnesses reorganize cognitive work.", 'key_components': ['The central design question is how aggressively active reasoning should be separated from stored state.', 'Memory architecture design follows a taxonomy that guides how state is externalized and accessed.'], 'section_ids': ['sec_8']}
论点验证
This is the central theoretical contribution of the paper. The paper systematically develops this thesis across Sections 2-7, providing extensive conceptual analysis, literature synthesis, and a coherent framework connecting memory, skills, protocols
The paper is explicitly organized around these four claims, devoting Sections 3, 4, 5, and 6 to each respectively. Each section provides detailed analysis, taxonomies, examples, and citations. The organizational structure itself demonstrates this con
The paper provides a well-documented historical analysis with Figure 2 visualizing the trajectory, specific timeline (2022-2026), and concrete examples at each stage (GPT-4, prompting techniques, RAG, Auto-GPT, BabyAGI, etc.). The three-layer model (
Section 3.1 develops this taxonomy with conceptual definitions and examples for each dimension: working context (immediate state), episodic experience (prior runs), semantic knowledge (abstractions), and personalized memory (user-specific information
Section 4.1 develops this three-component framework with conceptual analysis and citations. Operational procedures address process stability (citing Hsiao et al. 2025, Nandi et al. 2026), decision heuristics address branching choices (citing Gigerenz
Section 4.4 develops four acquisition pathways with concrete examples: authored (SKILL.md, AGENTS.md), distilled (Skill Set Optimization, MemSkill), discovered (Voyager, PolySkill), and composed (hierarchical skill repertoires). Each pathway has spec
Section 4.2 develops this three-stage evolution with specific examples: Stage 1 (Toolformer for atomic execution), Stage 2 (Gorilla, ToolLLM, ToolNet for large-scale selection), Stage 3 (program-based skill induction, web skill libraries, computer-us
Section 4.5 develops four boundary conditions with citations: semantic alignment (SkillProbe, Ross et al. 2025), portability and staleness (Wang et al. 2025c, SkillsBench), unsafe composition (Liu et al. 2026, Wang et al. 2026c), and context-dependen
Section 4.6 develops four couplings with conceptual analysis: conditioning on memory (retrieved state informs skill selection), binding through protocols (skills grounded via protocolized interfaces), runtime governance (permission checks, approval g
This is a claim about external research findings (Liu et al. 2024a). The paper cites this finding but does not reproduce the original data, methodology, or experimental setup. The claim cannot be verified from this paper alone - it requires accessing
This is a claim about external research findings (Liu et al. 2026). The paper reports this finding from another study but does not provide the original empirical data, methodology, or specific vulnerability rates. Cannot be verified from this paper a
This is a claim about external research findings (Wang et al. 2026c). The paper cites this attack-oriented study but does not reproduce the experiments or provide the original evidence. Cannot be verified from this paper alone.
This is a claim about external research findings (SkillsBench, Li et al. 2026c). The paper reports this finding but does not provide the benchmark methodology, specific domains tested, or quantitative variation data. Cannot be verified from this pape
This is a claim about external research findings (SkillProbe, Guo et al. 2026). The paper cites this study but does not provide the original methodology, specific inconsistencies identified, or empirical evidence. Cannot be verified from this paper a
This is a conceptual limitation claim that the paper develops with reasoning and citations. The paper explains why parametric knowledge is difficult to selectively update (requires retraining), compose (coupled in weights), and govern (distributed ac
This is a well-established limitation that the paper discusses with conceptual analysis. The paper explains the finiteness (token limits), cost at scale (computational overhead), and noise issues (marginally relevant material degrading performance).
This is a conceptual limitation that the paper develops clearly. The paper explains that without explicit externalization, each new session starts fresh ('partial amnesia'), which is a fundamental architectural constraint of current LLM systems.
This is a conceptual claim about the persistence of curation needs despite capacity expansion. The paper provides reasoning: expanded context (2K to 100K+ tokens) doesn't eliminate the need for selective curation because the fundamental tension betwe
This is a conceptual limitation with supporting citations. The paper explains that updating a single fact requires retraining, knowledge editing, or alignment patches, with citations to knowledge editing literature (Meng et al. 2022, Mitchell et al.
This is a conceptual limitation with supporting citation. The paper explains that auditing is difficult because knowledge is distributed across billions of parameters rather than in inspectable modules, citing Zhao et al. 2024 on explainability.
... 共 41 个论点
可复现性评估
较低可复现性 (0%)
缺失的复现细节
- No code repository available
- No data or supplementary materials available
- Unclear availability statements - fragmented text does not indicate clear access to resources
- No methodological details provided for the review/survey process
- No criteria for paper selection or inclusion in the review
- No details on how the taxonomy/framework was developed
- No information about systematic review methodology
- No details on analysis procedures or evaluation criteria
- Reference to Du [2026a] taxonomy but no implementation details provided
局限性(作者自述)
- A central limitation of parametric knowledge is that it is difficult to selectively update, compose, and govern.
- Context windows are finite, costly at scale, and often noisy when overloaded with marginally relevant material.
- Context is also ephemeral: unless state is explicitly externalized elsewhere, every new session begins with partial amnesia.
- Even as context lengths have expanded dramatically-from 2K tokens to over 100K and beyond-the fundamental tension persists: more capacity does not eliminate the need for selective curation.
- Updating a single fact-say, the current head of state of a country-requires retraining, knowledge editing, or patching through additional alignment layers, all of which risk unintended side effects on other capabilities.
本分析由 PDF 阅读助手 自动生成,仅供参考,不构成学术评审意见。验证结论和可复现性评估基于论文文本自动分析,可能存在偏差。原始论文请参阅 arXiv。
分析时间:2026-04-27T01:08:50+00:00 · 数据来源:Paper Collector