DTBench: A Synthetic Benchmark for Document-to-Table Extraction

Yuxiang Guo; Zhuoran Du; Nan Tang; Kezheng Tang; Congcong Ge; Yunjun Gao

DTBench: A Synthetic Benchmark for Document-to-Table Extraction

Yuxiang Guo, Zhuoran Du, Nan Tang, Kezheng Tang, Congcong Ge, Yunjun Gao

TL;DR

DTBench introduces a capability-aware benchmark for Doc2Table extraction by reversing the generation process (Table2Doc synthesis) to create synthetic documents from ground-truth tables. It defines a two-level taxonomy of Doc2Table capabilities (TA, RI, DR, EF, CR) with 13 subcategories, and collects 120 cases with 8,811 cell-level instances. Across eight mainstream LLMs, results reveal substantial gaps in indirect extraction, with multi-hop reasoning, faithfulness, and implicit conflict resolution posing the largest challenges. The dataset provides a controllable, scalable testbed for evaluating and improving reliable Doc2Table extraction and downstream SQL-based analytics.

Abstract

Document-to-table (Doc2Table) extraction derives structured tables from unstructured documents under a target schema, enabling reliable and verifiable SQL-based data analytics. Although large language models (LLMs) have shown promise in flexible information extraction, their ability to produce precisely structured tables remains insufficiently understood, particularly for indirect extraction that requires complex capabilities such as reasoning and conflict resolution. Existing benchmarks neither explicitly distinguish nor comprehensively cover the diverse capabilities required in Doc2Table extraction. We argue that a capability-aware benchmark is essential for systematic evaluation. However, constructing such benchmarks using human-annotated document-table pairs is costly, difficult to scale, and limited in capability coverage. To address this, we adopt a reverse Table2Doc paradigm and design a multi-agent synthesis workflow to generate documents from ground-truth tables. Based on this approach, we present DTBench, a synthetic benchmark that adopts a proposed two-level taxonomy of Doc2Table capabilities, covering 5 major categories and 13 subcategories. We evaluate several mainstream LLMs on DTBench, and demonstrate substantial performance gaps across models, as well as persistent challenges in reasoning, faithfulness, and conflict resolution. DTBench provides a comprehensive testbed for data generation and evaluation, facilitating future research on Doc2Table extraction. The benchmark is publicly available at https://github.com/ZJU-DAILY/DTBench.

DTBench: A Synthetic Benchmark for Document-to-Table Extraction

TL;DR

Abstract

Paper Structure (35 sections, 3 equations, 5 figures, 6 tables)

This paper contains 35 sections, 3 equations, 5 figures, 6 tables.

Introduction
Task Definition: Doc2Table Extraction
Problem Statement
Doc2Table Capabilities Taxonomy
Transformative Alignment (TA)
Reasoning & Inference (RI)
Distractor Robustness (DR)
Evidence Faithfulness (EF)
Conflict Resolution (CR)
Dataset Construction: Table2Doc Synthesis
Definition
Challenges & Overview
Multi-Agent Workflow
Step 1: Capability Annotation
Step 2: Refinement & Evidence Generation
...and 20 more sections

Figures (5)

Figure 1: Challenging examples of Doc2Table extraction that require different capabilities.
Figure 2: A Two-level Taxonomy of Doc2Table Extraction Capabilities.
Figure 2: Statistics of documents and tables in DTBench.
Figure 3: Overview of the proposed multi-agent workflow for Table2Doc synthesis.
Figure 5: Performance of LLMs across five capabilities (CSSR).

Theorems & Definitions (3)

Definition 1
Definition 2
Definition 3

DTBench: A Synthetic Benchmark for Document-to-Table Extraction

TL;DR

Abstract

DTBench: A Synthetic Benchmark for Document-to-Table Extraction

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (3)