Table of Contents
Fetching ...

FeTaQA: Free-form Table Question Answering

Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev

TL;DR

FeTaQA tackles free-form table question answering by introducing a 10K-instance dataset of table-question-answer tuples with supporting cells from Wikipedia. It formalizes the task as both a pipeline approach—combining a weakly supervised table semantic parser (TAPAS) with a data-to-text generator (T5)—and a single end-to-end sequence-to-sequence model (T5) that directly generates explanations from flattened table inputs. Empirical results show the end-to-end method delivers substantially better generation quality than the pipeline, though there remains a large gap to human references, highlighting the need for improved retrieval, reasoning, and faithful generation over semi-structured tables. The work also provides extensive dataset collection, annotation, and evaluation protocols, positioning FeTaQA as a challenging benchmark for future advances in generative table QA and related data-to-text tasks.

Abstract

Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question answering, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations. We provide two benchmark methods for the proposed task: a pipeline method based on semantic-parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods.

FeTaQA: Free-form Table Question Answering

TL;DR

FeTaQA tackles free-form table question answering by introducing a 10K-instance dataset of table-question-answer tuples with supporting cells from Wikipedia. It formalizes the task as both a pipeline approach—combining a weakly supervised table semantic parser (TAPAS) with a data-to-text generator (T5)—and a single end-to-end sequence-to-sequence model (T5) that directly generates explanations from flattened table inputs. Empirical results show the end-to-end method delivers substantially better generation quality than the pipeline, though there remains a large gap to human references, highlighting the need for improved retrieval, reasoning, and faithful generation over semi-structured tables. The work also provides extensive dataset collection, annotation, and evaluation protocols, positioning FeTaQA as a challenging benchmark for future advances in generative table QA and related data-to-text tasks.

Abstract

Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question answering, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations. We provide two benchmark methods for the proposed task: a pipeline method based on semantic-parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods.

Paper Structure

This paper contains 28 sections, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Examples of FeTaQA instances. Only part of the original table is shown for better visualization. These examples are referred as (a), (b), (c), (d) from upper left to bottom right in the paper.
  • Figure 2: FeTaQA Topics Distribution.
  • Figure 3: FeTaQA questions by top 5 most frequent starting words, where box size represents frequency.
  • Figure 4: Pipeline model and End-to-End model diagrams.
  • Figure 5: Weakly supervised fine-tuning of table semantic parser on FeTaQA. We choose a checkpoint of TAPAS-base fine-tuned on WikiTableQuestions to start with. After fine-tuning, the table semantic parser predicts denotations, which are then converted to triples and sent to the Data-to-Text module.
  • ...and 4 more figures