Table of Contents
Fetching ...

Hierarchical Deep Research with Local-Web RAG: Toward Automated System-Level Materials Discovery

Rui Ding, Rodrigo Pires Ferreira, Yuxin Chen, Junhong Chen

TL;DR

To address the challenge of system-level materials discovery (S3–S4), the paper proposes a long-horizon, hierarchical Deep Research (DR) agent that combines local retrieval-augmented generation with web augmentation, guided by a Deep Tree of Research (DToR). The method instantiates open, on-prem DR instances and orchestrates multiple Research Nodes through gap-driven expansion, diversity-aware querying, and provenance-rich synthesis. Across 27 topics and 41 agents, with LLM-as-judge rubrics and dry-lab validations, DToR consistently improves synthesis quality and often surpasses commercial DR systems at lower cost; ablations demonstrate the primacy of orchestration and web-augmented search. Dry-lab validations on PFAS, LIB binders, OER catalysts, and CO2 sensing show candidates competitive with or better than baselines, supporting practical deployment.

Abstract

We present a long-horizon, hierarchical deep research (DR) agent designed for complex materials and device discovery problems that exceed the scope of existing Machine Learning (ML) surrogates and closed-source commercial agents. Our framework instantiates a locally deployable DR instance that integrates local retrieval-augmented generation with large language model reasoners, enhanced by a Deep Tree of Research (DToR) mechanism that adaptively expands and prunes research branches to maximize coverage, depth, and coherence. We systematically evaluate across 27 nanomaterials/device topics using a large language model (LLM)-as-judge rubric with five web-enabled state-of-the-art models as jurors. In addition, we conduct dry-lab validations on five representative tasks, where human experts use domain simulations (e.g., density functional theory, DFT) to verify whether DR-agent proposals are actionable. Results show that our DR agent produces reports with quality comparable to--and often exceeding--those of commercial systems (ChatGPT-5-thinking/o3/o4-mini-high Deep Research) at a substantially lower cost, while enabling on-prem integration with local data and tools.

Hierarchical Deep Research with Local-Web RAG: Toward Automated System-Level Materials Discovery

TL;DR

To address the challenge of system-level materials discovery (S3–S4), the paper proposes a long-horizon, hierarchical Deep Research (DR) agent that combines local retrieval-augmented generation with web augmentation, guided by a Deep Tree of Research (DToR). The method instantiates open, on-prem DR instances and orchestrates multiple Research Nodes through gap-driven expansion, diversity-aware querying, and provenance-rich synthesis. Across 27 topics and 41 agents, with LLM-as-judge rubrics and dry-lab validations, DToR consistently improves synthesis quality and often surpasses commercial DR systems at lower cost; ablations demonstrate the primacy of orchestration and web-augmented search. Dry-lab validations on PFAS, LIB binders, OER catalysts, and CO2 sensing show candidates competitive with or better than baselines, supporting practical deployment.

Abstract

We present a long-horizon, hierarchical deep research (DR) agent designed for complex materials and device discovery problems that exceed the scope of existing Machine Learning (ML) surrogates and closed-source commercial agents. Our framework instantiates a locally deployable DR instance that integrates local retrieval-augmented generation with large language model reasoners, enhanced by a Deep Tree of Research (DToR) mechanism that adaptively expands and prunes research branches to maximize coverage, depth, and coherence. We systematically evaluate across 27 nanomaterials/device topics using a large language model (LLM)-as-judge rubric with five web-enabled state-of-the-art models as jurors. In addition, we conduct dry-lab validations on five representative tasks, where human experts use domain simulations (e.g., density functional theory, DFT) to verify whether DR-agent proposals are actionable. Results show that our DR agent produces reports with quality comparable to--and often exceeding--those of commercial systems (ChatGPT-5-thinking/o3/o4-mini-high Deep Research) at a substantially lower cost, while enabling on-prem integration with local data and tools.

Paper Structure

This paper contains 174 sections, 11 equations, 58 figures, 1 table.

Figures (58)

  • Figure 1: The DToR Depth-Breadth Workflow
  • Figure 2: (a) Overall five-dimension rubric scores across 41 agents (27 topics; 5 judges; 3 trials). (b) Full-factorial ablation (Method $\times$ LLM $\times$ Local retrieval); heatmap shows overall means; side bars show each factor impact.
  • Figure 3: Win Rate Violin Plot - Shows the distribution of win rates across different agents and topics with detailed statistical information and performance variations
  • Figure 4: Dry-lab validation on five tasks: radar of 10 simulation metrics comparing best local vs. best commercial DR Agents. The bar plots located at bottom right indicate respectively: the time of agent proposed candidates obtain score surpassed domain prior (>100); the time of agent proposed candidates obtain the highest score among three groups; the overall 10 metrics average score.
  • Figure 5: Visual representations of selected candidates drawn based on specific descriptions from the DR reports. The top panel illustrates candidates for PFAS sensing (A1, B1, C3), while the bottom panel shows candidates for PFAS degradation (A3, C4, C5), accompanied by the original text excerpts generated by the agents.Dry lab qualitative studies are available in Appendix \ref{['dry_lab_PFAS_sens']} and Appendix \ref{['dry_lab_PFAS_degra']}
  • ...and 53 more figures