Table of Contents
Fetching ...

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

Zhanli Li, Huiwen Tian, Lvzhou Luo, Yixuan Cao, Ping Luo

TL;DR

DeepRead is introduced, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes document-native priors for long-document question answering and achieves significant improvements over Search-o1-style agentic search in document question answering.

Abstract

With the rapid progress of tool-using and agentic large language models (LLMs), Retrieval-Augmented Generation (RAG) is evolving from one-shot, passive retrieval into multi-turn, decision-driven evidence acquisition. Despite strong results in open-domain settings, existing agentic search frameworks commonly treat long documents as flat collections of chunks, underutilizing document-native priors such as hierarchical organization and sequential discourse structure. We introduce DeepRead, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes these priors for long-document question answering. DeepRead leverages LLM-based OCR model to convert PDFs into structured Markdown that preserves headings and paragraph boundaries. It then indexes documents at the paragraph level and assigns each paragraph a coordinate-style metadata key encoding its section identity and in-section order. Building on this representation, DeepRead equips the LLM with two complementary tools: a Retrieve tool that localizes relevant paragraphs while exposing their structural coordinates (with lightweight scanning context), and a ReadSection tool that enables contiguous, order-preserving reading within a specified section and paragraph range. Our experiments demonstrate that DeepRead achieves significant improvements over Search-o1-style agentic search in document question answering. The synergistic effect between retrieval and reading tools is also validated. Our fine-grained behavioral analysis reveals a reading and reasoning paradigm resembling human-like ``locate then read'' behavior.

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

TL;DR

DeepRead is introduced, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes document-native priors for long-document question answering and achieves significant improvements over Search-o1-style agentic search in document question answering.

Abstract

With the rapid progress of tool-using and agentic large language models (LLMs), Retrieval-Augmented Generation (RAG) is evolving from one-shot, passive retrieval into multi-turn, decision-driven evidence acquisition. Despite strong results in open-domain settings, existing agentic search frameworks commonly treat long documents as flat collections of chunks, underutilizing document-native priors such as hierarchical organization and sequential discourse structure. We introduce DeepRead, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes these priors for long-document question answering. DeepRead leverages LLM-based OCR model to convert PDFs into structured Markdown that preserves headings and paragraph boundaries. It then indexes documents at the paragraph level and assigns each paragraph a coordinate-style metadata key encoding its section identity and in-section order. Building on this representation, DeepRead equips the LLM with two complementary tools: a Retrieve tool that localizes relevant paragraphs while exposing their structural coordinates (with lightweight scanning context), and a ReadSection tool that enables contiguous, order-preserving reading within a specified section and paragraph range. Our experiments demonstrate that DeepRead achieves significant improvements over Search-o1-style agentic search in document question answering. The synergistic effect between retrieval and reading tools is also validated. Our fine-grained behavioral analysis reveals a reading and reasoning paradigm resembling human-like ``locate then read'' behavior.
Paper Structure (24 sections, 15 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 24 sections, 15 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: A Comparison of Search-o1-style Agentic Search and DeepRead on a Toy Case
  • Figure 2: This is the DeepRead framework diagram. It takes user questions parsed into Doc Schema as input. The LLM employs retrieval and reading tools to reason and think over the documents.
  • Figure 3: Fine-grained behavioral comparison between DeepRead and Search-o1 baselines. The panels illustrate the distribution of (a) the probability that the first action is a search, (b) the total number of tool calls per query, (c) input token consumption, and (d) output token generation across four benchmarks.
  • Figure 4: Impact of Retrieved Chunk Count ($k$) on Performance. We compare DeepRead against Search-o1 across four benchmarks with $k \in \{2, 3, 5, 7\}$. DeepRead exhibits consistent robustness, outperforming the baseline particularly in low-resource settings ($k=2$), validating the efficacy of structure-aware reading over flat retrieval expansion.
  • Figure 5: The system prompt used in DeepRead. It injects the hierarchical document skeleton (Directory Structure).
  • ...and 2 more figures