Question Answering with Texts and Tables through Deep Reinforcement Learning
Marcos M. José, Flávio N. Cação, Maria F. Ribeiro, Rafael M. Cheang, Paulo Pirozelli, Fabio G. Cozman
TL;DR
This work tackles open-domain multi-hop QA requiring information from both textual and tabular sources. It introduces a deep reinforcement learning agent that sequentially selects among three actions—Retrieve Texts, Retrieve Tables, and Generate the Answer—to orchestrate a retrieval-reader pipeline on OTT-QA. The best DRL configuration (BM25 retriever with a PPO Transformer policy) achieves an F1 score of 19.03, closely matching the strong non-DRL Iterative-Retrieval baseline, while dense retriever variants underperform, highlighting the importance of retrieval strategy and module compatibility. The study demonstrates the potential of sequential decision-making to guide modular QA systems and points to future work combining DRL with non-sequential components (e.g., Joint-Reranking, COS) to further improve performance on multi-source QA tasks.
Abstract
This paper proposes a novel architecture to generate multi-hop answers to open domain questions that require information from texts and tables, using the Open Table-and-Text Question Answering dataset for validation and training. One of the most common ways to generate answers in this setting is to retrieve information sequentially, where a selected piece of data helps searching for the next piece. As different models can have distinct behaviors when called in this sequential information search, a challenge is how to select models at each step. Our architecture employs reinforcement learning to choose between different state-of-the-art tools sequentially until, in the end, a desired answer is generated. This system achieved an F1-score of 19.03, comparable to iterative systems in the literature.
