Table of Contents
Fetching ...

Decomposition-Driven Multi-Table Retrieval and Reasoning for Numerical Question Answering

Feng Luo, Hai Lan, Hui Luo, Zhifeng Bao, Xiaoli Wang, J. Shane Culpepper, Shazia Sadiq

TL;DR

DMRAL, a Decomposition-driven Multi-table Retrieval and Answering framework for MTQA over large-scale table collections, which consists of constructing a table relationship graph to capture complex relationships among tables and producing correct answers by progressively generating and refining the reasoning program based on sub-questions.

Abstract

In this paper, we study the problem of numerical multi-table question answering (MTQA) over large-scale table collections (e.g., online data repositories). This task is essential in many analytical applications. Existing MTQA solutions, such as text-to-SQL or open-domain MTQA methods, are designed for databases and struggle when applied to large-scale table collections. The key limitations include: (1) Limited support for complex table relationships; (2) Ineffective retrieval of relevant tables at scale; (3) Inaccurate answer generation. To overcome these limitations, we propose DMRAL, a Decomposition-driven Multi-table Retrieval and Answering framework for MTQA over large-scale table collections, which consists of: (1) constructing a table relationship graph to capture complex relationships among tables; (2) Table-Aligned Question Decomposer and Coverage-Aware Retriever, which jointly enable the effective identification of relevant tables from large-scale corpora by enhancing the question decomposition quality and maximizing the question coverage of retrieved tables; and (3) Sub-question Guided Reasoner, which produces correct answers by progressively generating and refining the reasoning program based on sub-questions. Experiments on two MTQA datasets demonstrate that DMRAL significantly outperforms existing state-of-the-art MTQA methods, with an average improvement of 24% in table retrieval and 55% in answer accuracy.

Decomposition-Driven Multi-Table Retrieval and Reasoning for Numerical Question Answering

TL;DR

DMRAL, a Decomposition-driven Multi-table Retrieval and Answering framework for MTQA over large-scale table collections, which consists of constructing a table relationship graph to capture complex relationships among tables and producing correct answers by progressively generating and refining the reasoning program based on sub-questions.

Abstract

In this paper, we study the problem of numerical multi-table question answering (MTQA) over large-scale table collections (e.g., online data repositories). This task is essential in many analytical applications. Existing MTQA solutions, such as text-to-SQL or open-domain MTQA methods, are designed for databases and struggle when applied to large-scale table collections. The key limitations include: (1) Limited support for complex table relationships; (2) Ineffective retrieval of relevant tables at scale; (3) Inaccurate answer generation. To overcome these limitations, we propose DMRAL, a Decomposition-driven Multi-table Retrieval and Answering framework for MTQA over large-scale table collections, which consists of: (1) constructing a table relationship graph to capture complex relationships among tables; (2) Table-Aligned Question Decomposer and Coverage-Aware Retriever, which jointly enable the effective identification of relevant tables from large-scale corpora by enhancing the question decomposition quality and maximizing the question coverage of retrieved tables; and (3) Sub-question Guided Reasoner, which produces correct answers by progressively generating and refining the reasoning program based on sub-questions. Experiments on two MTQA datasets demonstrate that DMRAL significantly outperforms existing state-of-the-art MTQA methods, with an average improvement of 24% in table retrieval and 55% in answer accuracy.
Paper Structure (46 sections, 1 equation, 14 figures, 10 tables)

This paper contains 46 sections, 1 equation, 14 figures, 10 tables.

Figures (14)

  • Figure 1: A comparison of problem settings and system capabilities among existing MTQA approaches and our proposed DMRAL.
  • Figure 2: MTQA processing with DMRAL, in which intermediate results at each step can be traced and refined.
  • Figure 3: Efficiency breakdown of table retrieval and answer generation, measured by the average runtime per question.
  • Figure 4: Impact of joinability/unionability thresholds on retrieval effectiveness and answer accuracy.
  • Figure 5: Retrieval effectiveness/efficiency vs. table scale for DMRAL.
  • ...and 9 more figures

Theorems & Definitions (1)

  • Definition 1: Numerical Multi-Table Question Answering