Table of Contents
Fetching ...

REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

Rishita Agarwal, Himanshu Singhal, Peter Baile Chen, Manan Roy Choudhury, Dan Roth, Vivek Gupta

TL;DR

REaR introduces a three-stage, LLM-free framework for multi-table retrieval in Text-to-SQL that decouples query–table relevance from table–table joinability. It retrieves a base set of semantically relevant tables, then expands this set with structurally joinable candidates via precomputed column embeddings and FAISS, followed by a refinement stage that jointly scores relevance and joinability to prune noisy candidates. The approach yields improved retrieval quality and end-to-end SQL execution accuracy across MMQA, BIRD, and Spider, while delivering substantial efficiency gains over LLM-based retrieval methods. Across ablations and comparisons to oracle baselines, REaR demonstrates that explicit modeling of table interoperability is crucial for effective multi-table reasoning and practical deployment at scale.

Abstract

Answering natural language queries over relational data often requires retrieving and reasoning over multiple tables, yet most retrievers optimize only for query-table relevance and ignore table table compatibility. We introduce REAR (Retrieve, Expand and Refine), a three-stage, LLM-free framework that separates semantic relevance from structural joinability for efficient, high-fidelity multi-table retrieval. REAR (i) retrieves query-aligned tables, (ii) expands these with structurally joinable tables via fast, precomputed column-embedding comparisons, and (iii) refines them by pruning noisy or weakly related candidates. Empirically, REAR is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets (BIRD, MMQA, and Spider) by improving both multi-table retrieval quality and downstream SQL execution. Despite being LLM-free, it delivers performance competitive with state-of-the-art LLM-augmented retrieval systems (e.g.,ARM) while achieving much lower latency and cost. Ablations confirm complementary gains from expansion and refinement, underscoring REAR as a practical, scalable building block for table-based downstream tasks (e.g., Text-to-SQL).

REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

TL;DR

REaR introduces a three-stage, LLM-free framework for multi-table retrieval in Text-to-SQL that decouples query–table relevance from table–table joinability. It retrieves a base set of semantically relevant tables, then expands this set with structurally joinable candidates via precomputed column embeddings and FAISS, followed by a refinement stage that jointly scores relevance and joinability to prune noisy candidates. The approach yields improved retrieval quality and end-to-end SQL execution accuracy across MMQA, BIRD, and Spider, while delivering substantial efficiency gains over LLM-based retrieval methods. Across ablations and comparisons to oracle baselines, REaR demonstrates that explicit modeling of table interoperability is crucial for effective multi-table reasoning and practical deployment at scale.

Abstract

Answering natural language queries over relational data often requires retrieving and reasoning over multiple tables, yet most retrievers optimize only for query-table relevance and ignore table table compatibility. We introduce REAR (Retrieve, Expand and Refine), a three-stage, LLM-free framework that separates semantic relevance from structural joinability for efficient, high-fidelity multi-table retrieval. REAR (i) retrieves query-aligned tables, (ii) expands these with structurally joinable tables via fast, precomputed column-embedding comparisons, and (iii) refines them by pruning noisy or weakly related candidates. Empirically, REAR is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets (BIRD, MMQA, and Spider) by improving both multi-table retrieval quality and downstream SQL execution. Despite being LLM-free, it delivers performance competitive with state-of-the-art LLM-augmented retrieval systems (e.g.,ARM) while achieving much lower latency and cost. Ablations confirm complementary gains from expansion and refinement, underscoring REAR as a practical, scalable building block for table-based downstream tasks (e.g., Text-to-SQL).

Paper Structure

This paper contains 38 sections, 13 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Overview of the REaR framework. (1) Retrieve: Select top-k relevant tables. (2) Expand: Augment with joinable tables using FAISS column search and cross-encoder reranking (k$\to$k'$+\Delta k'$ candidates). (3) Refine: Score candidates by query relevance and table joinability, rerank to final top-k. Top boxes: offline preprocessing
  • Figure 2: The average recall and Full Recall of base retrieval (UAE) and our method with modules removed: REaR(-Expansion), REaR(-Refinement), and REaR for the top-5 retrieved objects across Bird, MMQA, and Spider