MURRE: Multi-Hop Table Retrieval with Removal for Open-Domain Text-to-SQL
Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che
TL;DR
The paper tackles open-domain text-to-SQL, where the task requires retrieving relevant database tables and generating SQL queries from natural language. It identifies that conventional multi-hop retrieval designed for open-domain QA is ill-suited for this setting, because questions typically contain all necessary information; thus Murre introduces a removal-based multi-hop retrieval, where information from previously retrieved tables is removed from the question to guide the next hop toward unretrieved relevant tables. Murre combines beam-search retrieval with a removal step, using an embedding-based cosine probability for retrieval and LLM-driven removal to maintain focus on novel relevant tables, followed by a two-part scoring mechanism to select the top-N tables for SQL generation. Empirical results on SpiderUnion and BirdUnion show Murre achieves an average 5.7% improvement over prior SOTA, with larger gains on BirdUnion and with controlled trade-offs as the number of retrieved tables grows. The work advances open-domain text-to-SQL by enabling robust multi-hop retrieval that prioritizes question-relevant, non-redundant tables, improving execution accuracy and providing insights into retrieval strategy and efficiency trade-offs in practical applications.
Abstract
The open-domain text-to-SQL task aims to retrieve question-relevant tables from massive databases and generate SQL. However, the performance of current methods is constrained by single-hop retrieval, and existing multi-hop retrieval of open-domain question answering is not directly applicable due to the tendency to retrieve tables similar to the retrieved ones but irrelevant to the question. Since the questions in text-to-SQL usually contain all required information, while previous multi-hop retrieval supplements the questions with retrieved documents. Therefore, we propose the multi-hop table retrieval with removal (MURRE), which removes previously retrieved information from the question to guide the retriever towards unretrieved relevant tables. Our experiments on two open-domain text-to-SQL datasets demonstrate an average improvement of 5.7% over the previous state-of-the-art results.
