Table of Contents
Fetching ...

DualResearch: Entropy-Gated Dual-Graph Retrieval for Answer Reconstruction

Jinxin Shi, Zongsheng Cao, Runmin Ma, Yusong Hu, Jie Zhou, Xin Li, Lei Bai, Liang He, Bo Zhang

TL;DR

DualResearch addresses noise and transient uncertainty in tool-intensive scientific reasoning by modeling two complementary knowledge channels: a Breadth Semantic Graph for stable background concepts and a Depth Causal Graph for executable reasoning traces. It encodes per-layer relevance, derives channel-specific posteriors $P_B(a|q)$ and $P_D(a|q)$, and fuses them via a log-space entropy gate with calibration to produce a robust, auditable answer distribution $\tilde{P}(a|q)$. Empirical results on HLE and GPQA show consistent improvements over baselines and competitive performance against state-of-the-art approaches, with substantial gains when reusing existing problem-solving logs. The framework yields compact, verifiable reasoning graphs that enhance reliability and reproducibility, with potential for future multimodal extensions to broaden applicability in scientific inquiry.

Abstract

The deep-research framework orchestrates external tools to perform complex, multi-step scientific reasoning that exceeds the native limits of a single large language model. However, it still suffers from context pollution, weak evidentiary support, and brittle execution paths. To address these issues, we propose DualResearch, a retrieval and fusion framework that matches the epistemic structure of tool-intensive reasoning by jointly modeling two complementary graphs: a breadth semantic graph that encodes stable background knowledge, and a depth causal graph that captures execution provenance. Each graph has a layer-native relevance function, seed-anchored semantic diffusion for breadth, and causal-semantic path matching with reliability weighting for depth. To reconcile their heterogeneity and query-dependent uncertainty, DualResearch converts per-layer path evidence into answer distributions and fuses them in log space via an entropy-gated rule with global calibration. The fusion up-weights the more certain channel and amplifies agreement. As a complement to deep-research systems, DualResearch compresses lengthy multi-tool execution logs into a concise reasoning graph, and we show that it can reconstruct answers stably and effectively. On the scientific reasoning benchmarks HLE and GPQA, DualResearch achieves competitive performance. Using log files from the open-source system InternAgent, its accuracy improves by 7.7% on HLE and 6.06% on GPQA.

DualResearch: Entropy-Gated Dual-Graph Retrieval for Answer Reconstruction

TL;DR

DualResearch addresses noise and transient uncertainty in tool-intensive scientific reasoning by modeling two complementary knowledge channels: a Breadth Semantic Graph for stable background concepts and a Depth Causal Graph for executable reasoning traces. It encodes per-layer relevance, derives channel-specific posteriors and , and fuses them via a log-space entropy gate with calibration to produce a robust, auditable answer distribution . Empirical results on HLE and GPQA show consistent improvements over baselines and competitive performance against state-of-the-art approaches, with substantial gains when reusing existing problem-solving logs. The framework yields compact, verifiable reasoning graphs that enhance reliability and reproducibility, with potential for future multimodal extensions to broaden applicability in scientific inquiry.

Abstract

The deep-research framework orchestrates external tools to perform complex, multi-step scientific reasoning that exceeds the native limits of a single large language model. However, it still suffers from context pollution, weak evidentiary support, and brittle execution paths. To address these issues, we propose DualResearch, a retrieval and fusion framework that matches the epistemic structure of tool-intensive reasoning by jointly modeling two complementary graphs: a breadth semantic graph that encodes stable background knowledge, and a depth causal graph that captures execution provenance. Each graph has a layer-native relevance function, seed-anchored semantic diffusion for breadth, and causal-semantic path matching with reliability weighting for depth. To reconcile their heterogeneity and query-dependent uncertainty, DualResearch converts per-layer path evidence into answer distributions and fuses them in log space via an entropy-gated rule with global calibration. The fusion up-weights the more certain channel and amplifies agreement. As a complement to deep-research systems, DualResearch compresses lengthy multi-tool execution logs into a concise reasoning graph, and we show that it can reconstruct answers stably and effectively. On the scientific reasoning benchmarks HLE and GPQA, DualResearch achieves competitive performance. Using log files from the open-source system InternAgent, its accuracy improves by 7.7% on HLE and 6.06% on GPQA.

Paper Structure

This paper contains 23 sections, 1 theorem, 16 equations, 6 figures, 5 tables.

Key Result

Theorem 1

Under the setting above, for any $(q,y)$ and any fixed $\alpha\in[0,1]$, Consequently, for the entropy gate $\alpha(H)$, where $H=(H_{B},H_{D})$, $\Delta(H)=\mathbb{E}[\ell_{D}\mid H]-\mathbb{E}[\ell_{B}\mid H]$, and $\alpha^\star(H)=\mathbb{I}\{\Delta(H)<0\}$ is the oracle gate. If both channels are entropy–loss calibrated and the sign-consistency condition holds almost surely, then $\alpha(H)

Figures (6)

  • Figure 1: The illustrations for DualResearch. Left: Performance comparison on the HLE benchmark, where DualResearch consistently outperforms strong baselines across diverse scientific domains. Right: A case on the “Turing Machine halting steps” problem, where deep research produces an incorrect conclusion due to noisy retrieval and missing causal constraints, while DualResearch leverages structured process graphs and entropy-gated aggregation to derive the correct answer.
  • Figure 2: Workflow of the DualResearch. Stage 1: scientific tasks are executed with Deep Research tools to produce raw logs. Stage 2: logs are structured into stepwise traces with intermediate artifacts. Stage 3: evidence is organized into a Breadth Semantic Graph and a Depth Causal Graph, whose outputs are fused by an entropy-gated aggregator to yield the final answer.
  • Figure 3: Case study of a historical query comparing InternAgent and DualResearch.
  • Figure 4: Accuracy comparison between two graph construction strategies across disciplines. Signal denotes single-sample graph construction, where each instance independently forms a knowledge graph. Subject denotes subject-level multi-sample graph aggregation, where multiple instances within the same discipline are merged into a unified graph.
  • Figure 5: The prompt used to call LLM to answer scientific questions in HLE and GPQA.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 1: Breadth Semantic Graph by Static Background
  • Definition 2: Depth Causal Graph by Procedural Background
  • Definition 3: Dual-graph posteriors and entropy-gated fusion
  • Theorem 1: Generalization advantage of entropy-gated dual-graph fusion
  • proof