Table of Contents
Fetching ...

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs

Fengbin Zhu, Chao Wang, Fuli Feng, Zifeng Ren, Moxin Li, Tat-Seng Chua

TL;DR

Doc2SoarGraph tackles the challenging problem of discrete reasoning over visually-rich table-text documents in the TAT-DQA setting by modeling element-level semantics with semantic-oriented hierarchical graphs. It defines four node types (Question, Block, Quantity, Date), constructs four graphs ($G_{QC}$, $G_{DC}$, $G_{TR}$, $G_{SD}$), and learns representations with per-graph GCNs, followed by evidence-based node selection and multi-type answer generation, including a tree-based Arithmetic decoder. The training objective combines multiple losses ($\mathcal{L}=\mathcal{L}_{node}+\mathcal{L}_{tree}+\mathcal{L}_{start}+\mathcal{L}_{end}+\mathcal{L}_{type}+\mathcal{L}_{token}+\mathcal{L}_{scale}$) to jointly supervise evidence selection, reasoning, and answer synthesis. Empirical results on the TAT-DQA dataset show substantial gains over MHST and zero-shot LLMs, with notable improvements on Arithmetic questions and evidence extraction, establishing a new state-of-the-art and highlighting the practical value for real-world finance document QA. The work advances robust, document-centric discrete reasoning and provides open-source code to promote reproducibility and broader adoption.

Abstract

Discrete reasoning over table-text documents (e.g., financial reports) gains increasing attention in recent two years. Existing works mostly simplify this challenge by manually selecting and transforming document pages to structured tables and paragraphs, hindering their practical application. In this work, we explore a more realistic problem setting in the form of TAT-DQA, i.e. to answer the question over a visually-rich table-text document. Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability by harnessing the differences and correlations among different elements (e.g., quantities, dates) of the given question and document with Semantic-oriented hierarchical Graph structures. We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set, achieving the new state-of-the-art.

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs

TL;DR

Doc2SoarGraph tackles the challenging problem of discrete reasoning over visually-rich table-text documents in the TAT-DQA setting by modeling element-level semantics with semantic-oriented hierarchical graphs. It defines four node types (Question, Block, Quantity, Date), constructs four graphs (, , , ), and learns representations with per-graph GCNs, followed by evidence-based node selection and multi-type answer generation, including a tree-based Arithmetic decoder. The training objective combines multiple losses () to jointly supervise evidence selection, reasoning, and answer synthesis. Empirical results on the TAT-DQA dataset show substantial gains over MHST and zero-shot LLMs, with notable improvements on Arithmetic questions and evidence extraction, establishing a new state-of-the-art and highlighting the practical value for real-world finance document QA. The work advances robust, document-centric discrete reasoning and provides open-source code to promote reproducibility and broader adoption.

Abstract

Discrete reasoning over table-text documents (e.g., financial reports) gains increasing attention in recent two years. Existing works mostly simplify this challenge by manually selecting and transforming document pages to structured tables and paragraphs, hindering their practical application. In this work, we explore a more realistic problem setting in the form of TAT-DQA, i.e. to answer the question over a visually-rich table-text document. Specifically, we propose a novel Doc2SoarGraph framework with enhanced discrete reasoning capability by harnessing the differences and correlations among different elements (e.g., quantities, dates) of the given question and document with Semantic-oriented hierarchical Graph structures. We conduct extensive experiments on TAT-DQA dataset, and the results show that our proposed framework outperforms the best baseline model by 17.73% and 16.91% in terms of Exact Match (EM) and F1 score respectively on the test set, achieving the new state-of-the-art.
Paper Structure (18 sections, 7 equations, 4 figures, 5 tables)

This paper contains 18 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An example from TAT-DQA. We leverage four types of semantic elements from the question and document to facilitate discrete reasoning, i.e., Date, Quantity, Question and Block, marked in red, purple, yellow and blue rectangle, respectively. The quantities with yellow background are supporting evidence to the question. The "million" with green background is the scale of the answer.
  • Figure 2: An overview of proposed Doc2SoarGraph model. Take the sample in Figure \ref{['fig:sample']} as an example.
  • Figure 3: Comparison of evidence extraction power of our Doc2SoarGraph and MHST on Arithmetic questions on dev set.
  • Figure 4: Performance comparison in F1 score on one- and multi-page documents on test set.