This paper introduces Direct Corpus Interaction (DCI), where agents search raw corpora using terminal tools instead of semantic retrievers. DCI achieves 80.0% accuracy on BrowseComp-Plus (+11.0 over baseline with 29.4% cost reduction), 83.0% on QA (+30.7 over baseline), and 68.
核心问题
Can retrieval for agentic search be improved by replacing conventional retriever-mediated interfaces with direct corpus interaction via terminal tools, and how does the retrieval interface itself shape agent capabilities?
核心方法
{'approach': 'The authors implement two DCI agents (DCI-Agent-Lite and DCI-Agent-CC) that search raw corpora using terminal tools like grep, file reads, and shell commands instead of embedding-based retrievers. They evaluate across three benchmark families: BrowseComp-Plus for agentic search, six knowledge-intensive QA datasets, and IR ranking benchmarks (BRIGHT and BEIR), comparing against retrieval agents and sparse/dense retrievers.', 'key_components': [], 'section_ids': []}
论点验证
The paper clearly defines and formalizes DCI in paragraphs 4, 13-15. It provides concrete details about how DCI works: the agent bypasses embedding models and vector indexes, instead using terminal tools (grep, rg, find, glob, file reads) to search t
The paper explicitly states this as a key contribution in paragraph 9. DCI is formalized in paragraphs 13-15, and systematically evaluated across three benchmark families: BrowseComp-Plus (agentic search), six knowledge-intensive QA datasets, and six
DCI-Agent-Lite is described in detail in paragraph 17. The paper specifies it is adapted from Pi, restricted to raw terminal interaction, uses bash and file reads, with grep/rg for lexical matching, find/glob for file discovery, and lightweight scrip
DCI-Agent-CC is described in paragraph 18. The paper specifies it uses Claude Code as an off-the-shelf CLI agent, provides stronger prompting, more robust tool orchestration, and built-in context handling compared to DCI-Agent-Lite. It still operates
The context-management layer is described in detail in paragraphs 18-21. All three mechanisms are fully specified: Truncation (p_19), Compaction (p_20), and Summarization (p_21), with specific thresholds and behaviors for each.
The concept of 'retrieval interface resolution' is introduced in paragraph 6 and listed as a contribution in paragraph 9. The paper defines it as 'the ability to operate on units smaller and more precise than entire documents or passages.' This conce
The two trajectory-level metrics are formally defined in paragraphs 24-31. Coverage is defined in paragraphs 25-27 with three aggregate measures (coverage_any, coverage_mean, coverage_all). Localization is formally defined in paragraphs 28-32 with ma
Specific quantitative results are provided in paragraphs 5 and 42: accuracy improves from 69.0% to 80.0% (+11.0 points) and cost reduces from $1,440 to $1,016 (-29.4%). These are concrete, verifiable numbers from controlled experiments with the same
Specific quantitative results are provided in paragraphs 5 and 43: DCI-Agent-CC achieves 83.0% average accuracy on multi-hop QA, surpassing ASearcher-Local-14B (52.3%) by 30.7 points. The paper provides a table (Table 2) with results across six datas
Specific quantitative results are provided in paragraphs 5 and 44: DCI-Agent-CC achieves 68.5 average NDCG@10 on IR ranking, outperforming ReasonRank-32B (47.0%) by 21.5 points. Results are shown in Table 3 across six datasets.
Specific quantitative results are provided in paragraph 42: DCI-Agent-CC achieves 80.0% accuracy, surpassing GPT-5 + Qwen3-Embedding-8B (71.7%) by 8.3 points. The comparison is explicit and the math is verifiable (80.0 - 71.7 = 8.3).
Specific quantitative results are provided in paragraph 42: DCI-Agent-Lite achieves 62.9% accuracy at $93 cost, compared to o3 + Qwen3-Embedding-8B at 66.0% with cost reduction of $647. The numbers are concrete and verifiable.
Specific quantitative results are provided in paragraph 43 with exact numbers: DCI-Agent-CC at 83.0%, ASearcher-Local-14B at 52.3% (difference of 30.7 points), and DCI-Agent-Lite at 68.0%.
Specific quantitative results are provided in paragraph 43 with per-dataset breakdowns: 30 points on HotpotQA, 26 on 2Wiki, and 50 on MuSiQue relative to ASearcher-Local-14B.
Specific quantitative results are provided in paragraph 44: DCI-Agent-CC achieves 68.5% average NDCG@10, best on all six datasets, exceeding ReasonRank-32B (47.0%) by 21.5 points.
Specific quantitative results are provided in paragraph 44: DCI-Agent-Lite ranks second with 56.7 average NDCG@10, 9.7 points above ReasonRank-32B (47.0%).
Specific quantitative results from trajectory analysis are provided in paragraph 46: DCI-Agent-CC correctly answers 176 questions that the matched retrieval agent misses, while only 76 show the reverse pattern. This is a direct comparison with concre
Specific quantitative analysis is provided in paragraph 46: among 176 CC-win cases, only 34 contain no gold documents retrieved by the retrieval agent, meaning 142 of 176 (81%) involve cases where retrieval agent had already surfaced some or all gold
Specific quantitative results from controlled ablation are provided in paragraph 47: with only 'read + grep', the agent achieves 61% accuracy on BrowseComp-Plus, outperforming Qwen3-Embedding-8B baseline (45%) by 16 points.
Specific quantitative results from controlled ablation are provided in paragraph 47: enabling bash command set adds a further 12-point gain (from 61% to presumably 73%, though the exact number isn't stated, the 12-point gain is explicit). The paper n
... 共 36 个论点
可复现性评估
较低可复现性 (0%)
缺失的复现细节
- No code available - implementations of DCI-Agent-Lite and DCI-Agent-CC are not provided
- No data available - datasets used are not released
- Exact prompts and prompt templates used for the agents are not provided
- LLM hyperparameters not specified (temperature, max tokens, top-p, etc.)
- Random seeds not reported for sampling 50 examples per dataset
- Exact model versions/API endpoints not specified (Claude Code version unclear)
- Runtime context management layer implementation details not provided
- Hardware and computational environment specifications not mentioned
- Number of experimental runs and statistical significance testing not reported
- Evaluation metrics implementation details not provided
局限性(作者自述)
- In real agentic workspaces where corpora can be local, heterogeneous, and continually evolving, DCI via a standard bash terminal requires no offline embedding or indexing, adapts naturally to changing files, and lets the agent operate directly within the environment it is reasoning over.
- Dense and sparse retrieval remain scalable and effective for large, static corpora, but they occupy only one point in the broader design space of corpus interfaces.
本分析由 PDF 阅读助手 自动生成,仅供参考,不构成学术评审意见。验证结论和可复现性评估基于论文文本自动分析,可能存在偏差。原始论文请参阅 arXiv。
分析时间:2026-05-09T07:11:20+00:00 · 数据来源:Paper Collector