CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL
Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, Sercan O. Arik
TL;DR
CHASE-SQL introduces a test-time, multi-agent framework for Text-to-SQL that generates diverse SQL candidates using three reasoning strategies (Divide-and-Conquer CoT, Query Plan CoT, and Online Synthetic Example Generation) and selects the best candidate with a fine-tuned binary Selection Agent trained on pairwise comparisons. The approach is complemented by value retrieval via LSH-based keyword extraction and a query fixer for iterative corrections, forming an ensemble that significantly outperforms previous methods on BIRD (71+% EX) and Spider (87.6% EX) benchmarks. Key contributions include a robust candidate-generation suite, an effective fixer, and a pairwise selection mechanism that surpasses self-consistency baselines, achieving state-of-the-art results at submission. The results demonstrate the value of test-time computation and ensemble reasoning for complex Text-to-SQL tasks, with strong generalization to unseen domains. CHASE-SQL thus provides a practical framework for deploying high-accuracy Text-to-SQL systems in real-world databases.
Abstract
In tackling the challenges of large language model (LLM) performance for Text-to-SQL tasks, we introduce CHASE-SQL, a new framework that employs innovative strategies, using test-time compute in multi-agent modeling to improve candidate generation and selection. CHASE-SQL leverages LLMs' intrinsic knowledge to generate diverse and high-quality SQL candidates using different LLM generators with: (1) a divide-and-conquer method that decomposes complex queries into manageable sub-queries in a single LLM call; (2) chain-of-thought reasoning based on query execution plans, reflecting the steps a database engine takes during execution; and (3) a unique instance-aware synthetic example generation technique, which offers specific few-shot demonstrations tailored to test questions.To identify the best candidate, a selection agent is employed to rank the candidates through pairwise comparisons with a fine-tuned binary-candidates selection LLM. This selection approach has been demonstrated to be more robust over alternatives. The proposed generators-selector framework not only enhances the quality and diversity of SQL queries but also outperforms previous methods. Overall, our proposed CHASE-SQL achieves the state-of-the-art execution accuracy of 73.0% and 73.01% on the test set and development set of the notable BIRD Text-to-SQL dataset benchmark, rendering CHASE-SQL the top submission of the leaderboard (at the time of paper submission).
