Table of Contents
Fetching ...

Open Question Answering over Tables and Text

Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Wang, William W. Cohen

TL;DR

This work tackles open-question answering that must integrate both tabular and textual evidence by introducing OTT-QA, a large-scale dataset built from web tables and Wikipedia passages. The authors identify key bottlenecks in retrieval and reasoning across modalities and propose two innovations: fusion retrieval, which forms fused blocks of related table segments and passages, and cross-block reading, a Long-Range Transformer-based reader that can jointly attend to multiple retrieved blocks. Together, these methods yield substantial gains over strong baselines, raising EM from under 10% to around 28% on OTT-QA dev and approaching state-of-the-art levels on a separate dataset without extra re-ranking. The work also demonstrates the importance of query augmentation, ICT pretraining, and weak distant supervision signals, while analyzing error modes in retrieval and evidence fusion. Overall, OTT-QA significantly advances open-domain QA in mixed tabular/textual settings and suggests promising directions for multi-modal evidence gathering and reasoning.

Abstract

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over both tabular and textual data and present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging -- our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use "early fusion" to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidence with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27%.

Open Question Answering over Tables and Text

TL;DR

This work tackles open-question answering that must integrate both tabular and textual evidence by introducing OTT-QA, a large-scale dataset built from web tables and Wikipedia passages. The authors identify key bottlenecks in retrieval and reasoning across modalities and propose two innovations: fusion retrieval, which forms fused blocks of related table segments and passages, and cross-block reading, a Long-Range Transformer-based reader that can jointly attend to multiple retrieved blocks. Together, these methods yield substantial gains over strong baselines, raising EM from under 10% to around 28% on OTT-QA dev and approaching state-of-the-art levels on a separate dataset without extra re-ranking. The work also demonstrates the importance of query augmentation, ICT pretraining, and weak distant supervision signals, while analyzing error modes in retrieval and evidence fusion. Overall, OTT-QA significantly advances open-domain QA in mixed tabular/textual settings and suggests promising directions for multi-modal evidence gathering and reasoning.

Abstract

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question. Most open QA systems have considered only retrieving information from unstructured text. Here we consider for the first time open QA over both tabular and textual data and present a new large-scale dataset Open Table-and-Text Question Answering (OTT-QA) to evaluate performance on this task. Most questions in OTT-QA require multi-hop inference across tabular data and unstructured text, and the evidence required to answer a question can be distributed in different ways over these two types of input, making evidence retrieval challenging -- our baseline model using an iterative retriever and BERT-based reader achieves an exact match score less than 10%. We then propose two novel techniques to address the challenge of retrieving and aggregating evidence for OTT-QA. The first technique is to use "early fusion" to group multiple highly relevant tabular and textual units into a fused block, which provides more context for the retriever to search for. The second technique is to use a cross-block reader to model the cross-dependency between multiple retrieved evidence with global-local sparse attention. Combining these two techniques improves the score significantly, to above 27%.

Paper Structure

This paper contains 43 sections, 13 figures, 2 tables.

Figures (13)

  • Figure 1: The problem setting: A OTT-QA model needs to retrieve from two candidate pools and then perform multi-hop reasoning to find answers.
  • Figure 2: The 'de-contextualization' annotation phase of OTT-QA. In the first step, the annotator is restricted to add phrases from the context. In the second step, the annotator is specifically requested to make the sentence more concise and natural.
  • Figure 3: Left: Iterative 3-step retrieval over individual blocks (baseline). Right: Fusion 1-step retrieval over fused groups, which greatly lowers the cost of iterative encoding and retrieving.
  • Figure 4: Left: Single-block reader with input shorter than 512 tokens (baseline). Right: Cross-block reader with length over 4K tokens, and $\bar{A}$ denotes the global state assigned to local block A. The single-block reader is stuck at local optimum, while cross-block reader outputs global optimum.
  • Figure 5: Entity linker performance (F1).
  • ...and 8 more figures