Table of Contents
Fetching ...

Question Answering with Texts and Tables through Deep Reinforcement Learning

Marcos M. José, Flávio N. Cação, Maria F. Ribeiro, Rafael M. Cheang, Paulo Pirozelli, Fabio G. Cozman

TL;DR

This work tackles open-domain multi-hop QA requiring information from both textual and tabular sources. It introduces a deep reinforcement learning agent that sequentially selects among three actions—Retrieve Texts, Retrieve Tables, and Generate the Answer—to orchestrate a retrieval-reader pipeline on OTT-QA. The best DRL configuration (BM25 retriever with a PPO Transformer policy) achieves an F1 score of 19.03, closely matching the strong non-DRL Iterative-Retrieval baseline, while dense retriever variants underperform, highlighting the importance of retrieval strategy and module compatibility. The study demonstrates the potential of sequential decision-making to guide modular QA systems and points to future work combining DRL with non-sequential components (e.g., Joint-Reranking, COS) to further improve performance on multi-source QA tasks.

Abstract

This paper proposes a novel architecture to generate multi-hop answers to open domain questions that require information from texts and tables, using the Open Table-and-Text Question Answering dataset for validation and training. One of the most common ways to generate answers in this setting is to retrieve information sequentially, where a selected piece of data helps searching for the next piece. As different models can have distinct behaviors when called in this sequential information search, a challenge is how to select models at each step. Our architecture employs reinforcement learning to choose between different state-of-the-art tools sequentially until, in the end, a desired answer is generated. This system achieved an F1-score of 19.03, comparable to iterative systems in the literature.

Question Answering with Texts and Tables through Deep Reinforcement Learning

TL;DR

This work tackles open-domain multi-hop QA requiring information from both textual and tabular sources. It introduces a deep reinforcement learning agent that sequentially selects among three actions—Retrieve Texts, Retrieve Tables, and Generate the Answer—to orchestrate a retrieval-reader pipeline on OTT-QA. The best DRL configuration (BM25 retriever with a PPO Transformer policy) achieves an F1 score of 19.03, closely matching the strong non-DRL Iterative-Retrieval baseline, while dense retriever variants underperform, highlighting the importance of retrieval strategy and module compatibility. The study demonstrates the potential of sequential decision-making to guide modular QA systems and points to future work combining DRL with non-sequential components (e.g., Joint-Reranking, COS) to further improve performance on multi-source QA tasks.

Abstract

This paper proposes a novel architecture to generate multi-hop answers to open domain questions that require information from texts and tables, using the Open Table-and-Text Question Answering dataset for validation and training. One of the most common ways to generate answers in this setting is to retrieve information sequentially, where a selected piece of data helps searching for the next piece. As different models can have distinct behaviors when called in this sequential information search, a challenge is how to select models at each step. Our architecture employs reinforcement learning to choose between different state-of-the-art tools sequentially until, in the end, a desired answer is generated. This system achieved an F1-score of 19.03, comparable to iterative systems in the literature.
Paper Structure (21 sections, 1 equation, 3 figures, 2 tables)

This paper contains 21 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: QA pair from OTT-QA ott_qa. The question asks who is the author of the series in which Nonso Aznozie played the character Robert. To accurately answer this question, one must locate the relevant table labeled "Nonso Anozie" (highlighted in pink) and identify the cell with "Prime Suspect" (highlighted in yellow), by referencing the role "Robert" (highlighted in green). It is then possible to retrieve the associated text, which reveals that the creator of the movie was "Lynda La Plante" (highlighted in blue).
  • Figure 2: Proposed architecture. At each time step, the agent selects one of three actions: Retrieve Texts, Retrieve Tables, or Generate the Answer, based on the question and the information gathered so far.
  • Figure 3: Training workflow of the proposed architecture.