Hierarchical Deep Research with Local-Web RAG: Toward Automated System-Level Materials Discovery
Rui Ding, Rodrigo Pires Ferreira, Yuxin Chen, Junhong Chen
TL;DR
To address the challenge of system-level materials discovery (S3–S4), the paper proposes a long-horizon, hierarchical Deep Research (DR) agent that combines local retrieval-augmented generation with web augmentation, guided by a Deep Tree of Research (DToR). The method instantiates open, on-prem DR instances and orchestrates multiple Research Nodes through gap-driven expansion, diversity-aware querying, and provenance-rich synthesis. Across 27 topics and 41 agents, with LLM-as-judge rubrics and dry-lab validations, DToR consistently improves synthesis quality and often surpasses commercial DR systems at lower cost; ablations demonstrate the primacy of orchestration and web-augmented search. Dry-lab validations on PFAS, LIB binders, OER catalysts, and CO2 sensing show candidates competitive with or better than baselines, supporting practical deployment.
Abstract
We present a long-horizon, hierarchical deep research (DR) agent designed for complex materials and device discovery problems that exceed the scope of existing Machine Learning (ML) surrogates and closed-source commercial agents. Our framework instantiates a locally deployable DR instance that integrates local retrieval-augmented generation with large language model reasoners, enhanced by a Deep Tree of Research (DToR) mechanism that adaptively expands and prunes research branches to maximize coverage, depth, and coherence. We systematically evaluate across 27 nanomaterials/device topics using a large language model (LLM)-as-judge rubric with five web-enabled state-of-the-art models as jurors. In addition, we conduct dry-lab validations on five representative tasks, where human experts use domain simulations (e.g., density functional theory, DFT) to verify whether DR-agent proposals are actionable. Results show that our DR agent produces reports with quality comparable to--and often exceeding--those of commercial systems (ChatGPT-5-thinking/o3/o4-mini-high Deep Research) at a substantially lower cost, while enabling on-prem integration with local data and tools.
