DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

Zhanli Li; Huiwen Tian; Lvzhou Luo; Yixuan Cao; Ping Luo

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

Zhanli Li, Huiwen Tian, Lvzhou Luo, Yixuan Cao, Ping Luo

TL;DR

DeepRead is introduced, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes document-native priors for long-document question answering and achieves significant improvements over Search-o1-style agentic search in document question answering.

Abstract

With the rapid progress of tool-using and agentic large language models (LLMs), Retrieval-Augmented Generation (RAG) is evolving from one-shot, passive retrieval into multi-turn, decision-driven evidence acquisition. Despite strong results in open-domain settings, existing agentic search frameworks commonly treat long documents as flat collections of chunks, underutilizing document-native priors such as hierarchical organization and sequential discourse structure. We introduce DeepRead, a structure-aware, multi-turn document reasoning agent that explicitly operationalizes these priors for long-document question answering. DeepRead leverages LLM-based OCR model to convert PDFs into structured Markdown that preserves headings and paragraph boundaries. It then indexes documents at the paragraph level and assigns each paragraph a coordinate-style metadata key encoding its section identity and in-section order. Building on this representation, DeepRead equips the LLM with two complementary tools: a Retrieve tool that localizes relevant paragraphs while exposing their structural coordinates (with lightweight scanning context), and a ReadSection tool that enables contiguous, order-preserving reading within a specified section and paragraph range. Our experiments demonstrate that DeepRead achieves significant improvements over Search-o1-style agentic search in document question answering. The synergistic effect between retrieval and reading tools is also validated. Our fine-grained behavioral analysis reveals a reading and reasoning paradigm resembling human-like ``locate then read'' behavior.

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

TL;DR

Abstract

Paper Structure (24 sections, 15 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 24 sections, 15 equations, 7 figures, 12 tables, 1 algorithm.

Introduction
Related Work
Document QA
Document Parsing
Agentic Search
Methodology
Preliminaries: Agentic Search
Structured Document Modeling
Paragraph-Level Indexing and Metadata
System Prompt: Hierarchical Skeleton
Tools: Coordinate-Based Interaction
Experiment
Benchmark Details
Baseline and DeepRead Settings
Main Result
...and 9 more sections

Figures (7)

Figure 1: A Comparison of Search-o1-style Agentic Search and DeepRead on a Toy Case
Figure 2: This is the DeepRead framework diagram. It takes user questions parsed into Doc Schema as input. The LLM employs retrieval and reading tools to reason and think over the documents.
Figure 3: Fine-grained behavioral comparison between DeepRead and Search-o1 baselines. The panels illustrate the distribution of (a) the probability that the first action is a search, (b) the total number of tool calls per query, (c) input token consumption, and (d) output token generation across four benchmarks.
Figure 4: Impact of Retrieved Chunk Count ($k$) on Performance. We compare DeepRead against Search-o1 across four benchmarks with $k \in \{2, 3, 5, 7\}$. DeepRead exhibits consistent robustness, outperforming the baseline particularly in low-resource settings ($k=2$), validating the efficacy of structure-aware reading over flat retrieval expansion.
Figure 5: The system prompt used in DeepRead. It injects the hierarchical document skeleton (Directory Structure).
...and 2 more figures

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

TL;DR

Abstract

DeepRead: Document Structure-Aware Reasoning to Enhance Agentic Search

Authors

TL;DR

Abstract

Table of Contents

Figures (7)