RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context

Yu Lei; Shuzheng Si; Wei Wang; Yifei Wu; Gang Chen; Fanchao Qi; Maosong Sun

RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context

Yu Lei, Shuzheng Si, Wei Wang, Yifei Wu, Gang Chen, Fanchao Qi, Maosong Sun

TL;DR

RhinoInsight tackles the challenge of long-horizon deep research with error propagation and context rot by introducing two control mechanisms that regulate both model behavior and context. The Verifiable Checklist constrains planning through executable, verifiable sub-goals, while the Evidence Audit structures and audits context to bind high-quality evidence to claims and visuals, all without updating model parameters. The framework extends ReAct with a five-component loop and a memory-reconstruction strategy, achieving state-of-the-art performance on DeepResearch benchmarks (e.g., $R=50.92$ on the DeepResearch Bench) and competitive results on deep search tasks, including GAIA text-only. These contributions demonstrate that principled control over actions and context can substantially improve robustness, traceability, and accuracy in AI-assisted deep research systems, with potential for stronger reliability and real-world deployment; future work includes adaptive control policies and human-in-the-loop refinements to further boost reliability and efficiency.

Abstract

Large language models are evolving from single-turn responders into tool-using agents capable of sustained reasoning and decision-making for deep research. Prevailing systems adopt a linear pipeline of plan to search to write to a report, which suffers from error accumulation and context rot due to the lack of explicit control over both model behavior and context. We introduce RhinoInsight, a deep research framework that adds two control mechanisms to enhance robustness, traceability, and overall quality without parameter updates. First, a Verifiable Checklist module transforms user requirements into traceable and verifiable sub-goals, incorporates human or LLM critics for refinement, and compiles a hierarchical outline to anchor subsequent actions and prevent non-executable planning. Second, an Evidence Audit module structures search content, iteratively updates the outline, and prunes noisy context, while a critic ranks and binds high-quality evidence to drafted content to ensure verifiability and reduce hallucinations. Our experiments demonstrate that RhinoInsight achieves state-of-the-art performance on deep research tasks while remaining competitive on deep search tasks.

RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context

TL;DR

Abstract

RhinoInsight: Improving Deep Research through Control Mechanisms for Model Behavior and Context

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)