Table of Contents
Fetching ...

Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

Peter Baile Chen, Yi Zhang, Dan Roth

TL;DR

Open-domain QA over table corpora often requires retrieving multiple, joinable tables rather than a single table. The paper proposes a join-aware multi-table retrieval framework that re-ranks candidate tables via a mixed-integer program (MIP) that jointly optimizes coarse and fine-grained query-table relevance, table-table compatibility, coverage of sub-queries, and connectivity among tables, with the objective $\arg \max \sum_i r_i b_i + \sum_{q,i,k} r_{qik} d_{qik} + \sum_{i, j, k, l} \omega_{ij}^{kl} c_{ij}^{kl}$ under a set of constraints. Query-table relevance combines coarse table-level similarity and fine-grained sub-query-to-column mappings using a decomposition guided by an LLM and a bi-encoder, while table-table relevance computes a joinability score that blends schema and instance similarity with a key-foreign-key constraint likelihood via columnUniqueness. The approach is evaluated on Spider and Bird, showing up to 9.3% improvements in table retrieval F1 and up to 5.4% improvements in end-to-end QA accuracy, demonstrating that incorporating join information during retrieval substantially benefits downstream SQL generation and question answering. This work highlights that table retrieval remains an open problem and offers a scalable framework for integrating join relationships into the retrieval process to enhance real-world QA over structural data.

Abstract

Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many questions require retrieving multiple tables and joining them through a join plan that cannot be discerned from the user query itself. If the join plan is not considered in the retrieval stage, the subsequent steps of reasoning and answering based on those retrieved tables are likely to be incorrect. To address this problem, we introduce a method that uncovers useful join relations for any query and database during table retrieval. We use a novel re-ranking method formulated as a mixed-integer program that considers not only table-query relevance but also table-table relevance that requires inferring join relationships. Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.

Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval

TL;DR

Open-domain QA over table corpora often requires retrieving multiple, joinable tables rather than a single table. The paper proposes a join-aware multi-table retrieval framework that re-ranks candidate tables via a mixed-integer program (MIP) that jointly optimizes coarse and fine-grained query-table relevance, table-table compatibility, coverage of sub-queries, and connectivity among tables, with the objective under a set of constraints. Query-table relevance combines coarse table-level similarity and fine-grained sub-query-to-column mappings using a decomposition guided by an LLM and a bi-encoder, while table-table relevance computes a joinability score that blends schema and instance similarity with a key-foreign-key constraint likelihood via columnUniqueness. The approach is evaluated on Spider and Bird, showing up to 9.3% improvements in table retrieval F1 and up to 5.4% improvements in end-to-end QA accuracy, demonstrating that incorporating join information during retrieval substantially benefits downstream SQL generation and question answering. This work highlights that table retrieval remains an open problem and offers a scalable framework for integrating join relationships into the retrieval process to enhance real-world QA over structural data.

Abstract

Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many questions require retrieving multiple tables and joining them through a join plan that cannot be discerned from the user query itself. If the join plan is not considered in the retrieval stage, the subsequent steps of reasoning and answering based on those retrieved tables are likely to be incorrect. To address this problem, we introduce a method that uncovers useful join relations for any query and database during table retrieval. We use a novel re-ranking method formulated as a mixed-integer program that considers not only table-query relevance but also table-table relevance that requires inferring join relationships. Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
Paper Structure (32 sections, 17 equations, 1 figure, 4 tables)

This paper contains 32 sections, 17 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Previous work makes simplifying assumptions for table retrieval. Left block assumes there exists a single table that is sufficient to answer the question. Middle block assumes necessary joins can be recovered from query decomposition. However, in practice, the question may be more complicated and the join plan may not be discerned from the user query itself (right block). Therefore it is important to identify the join relationship while conducting table retrieval.