REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

Rishita Agarwal; Himanshu Singhal; Peter Baile Chen; Manan Roy Choudhury; Dan Roth; Vivek Gupta

REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

Rishita Agarwal, Himanshu Singhal, Peter Baile Chen, Manan Roy Choudhury, Dan Roth, Vivek Gupta

TL;DR

REaR introduces a three-stage, LLM-free framework for multi-table retrieval in Text-to-SQL that decouples query–table relevance from table–table joinability. It retrieves a base set of semantically relevant tables, then expands this set with structurally joinable candidates via precomputed column embeddings and FAISS, followed by a refinement stage that jointly scores relevance and joinability to prune noisy candidates. The approach yields improved retrieval quality and end-to-end SQL execution accuracy across MMQA, BIRD, and Spider, while delivering substantial efficiency gains over LLM-based retrieval methods. Across ablations and comparisons to oracle baselines, REaR demonstrates that explicit modeling of table interoperability is crucial for effective multi-table reasoning and practical deployment at scale.

Abstract

Answering natural language queries over relational data often requires retrieving and reasoning over multiple tables, yet most retrievers optimize only for query-table relevance and ignore table table compatibility. We introduce REAR (Retrieve, Expand and Refine), a three-stage, LLM-free framework that separates semantic relevance from structural joinability for efficient, high-fidelity multi-table retrieval. REAR (i) retrieves query-aligned tables, (ii) expands these with structurally joinable tables via fast, precomputed column-embedding comparisons, and (iii) refines them by pruning noisy or weakly related candidates. Empirically, REAR is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets (BIRD, MMQA, and Spider) by improving both multi-table retrieval quality and downstream SQL execution. Despite being LLM-free, it delivers performance competitive with state-of-the-art LLM-augmented retrieval systems (e.g.,ARM) while achieving much lower latency and cost. Ablations confirm complementary gains from expansion and refinement, underscoring REAR as a practical, scalable building block for table-based downstream tasks (e.g., Text-to-SQL).

REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

TL;DR

Abstract

REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)