HI-SQL: Optimizing Text-to-SQL Systems through Dynamic Hint Integration
Ganesh Parab, Zishan Ahmad, Dagnachew Birru
TL;DR
The paper tackles the inefficiency and error-proneness of multi-step Text-to-SQL pipelines by introducing HI-SQL, a hint-driven approach that generates contextual SQL-generation hints from historical query logs. The method couples Hint Curation, SQL Generation, and SQL Verification to produce accurate queries with fewer LLM calls. Across BIRD, SPIDER-COMPLEX, ACME-INSURANCE, and BIRD SDS, HI-SQL achieves notable gains in Execution Accuracy while reducing latency, outperforming baselines and competitive methods. This work demonstrates the value of automated, data-driven hint generation for scalable, efficient natural language to SQL systems. The approach holds practical potential for real-world deployments with diverse schemas and datasets.
Abstract
Text-to-SQL generation bridges the gap between natural language and databases, enabling users to query data without requiring SQL expertise. While large language models (LLMs) have significantly advanced the field, challenges remain in handling complex queries that involve multi-table joins, nested conditions, and intricate operations. Existing methods often rely on multi-step pipelines that incur high computational costs, increase latency, and are prone to error propagation. To address these limitations, we propose HI-SQL, a pipeline that incorporates a novel hint generation mechanism utilizing historical query logs to guide SQL generation. By analyzing prior queries, our method generates contextual hints that focus on handling the complexities of multi-table and nested operations. These hints are seamlessly integrated into the SQL generation process, eliminating the need for costly multi-step approaches and reducing reliance on human-crafted prompts. Experimental evaluations on multiple benchmark datasets demonstrate that our approach significantly improves query accuracy of LLM-generated queries while ensuring efficiency in terms of LLM calls and latency, offering a robust and practical solution for enhancing Text-to-SQL systems.
