CrackSQL: A Hybrid SQL Dialect Translation System Powered by Large Language Models
Wei Zhou, Yuyang Gao, Xuanhe Zhou, Guoliang Li
TL;DR
CrackSQL tackles the problem of translating SQL across dialects by integrating rule-based methods with large language models to reduce manual effort and mitigate hallucinations. It introduces a cross-dialect embedding model and a functionality-based, local-to-global translation strategy to align dialect syntax and manage interdependent operations. The system employs four modules with three translation modes and offers multiple deployment options (web console, PyPI package, and CLI) for real-world adoption. Evaluation on a real-world benchmark demonstrates improved translation accuracy and robustness against dialect-specific quirks, illustrating practical impact for cross-database analytics and migrations.
Abstract
Dialect translation plays a key role in enabling seamless interaction across heterogeneous database systems. However, translating SQL queries between different dialects (e.g., from PostgreSQL to MySQL) remains a challenging task due to syntactic discrepancies and subtle semantic variations. Existing approaches including manual rewriting, rule-based systems, and large language model (LLM)-based techniques often involve high maintenance effort (e.g., crafting custom translation rules) or produce unreliable results (e.g., LLM generates non-existent functions), especially when handling complex queries. In this demonstration, we present CrackSQL, the first hybrid SQL dialect translation system that combines rule and LLM-based methods to overcome these limitations. CrackSQL leverages the adaptability of LLMs to minimize manual intervention, while enhancing translation accuracy by segmenting lengthy complex SQL via functionality-based query processing. To further improve robustness, it incorporates a novel cross-dialect syntax embedding model for precise syntax alignment, as well as an adaptive local-to-global translation strategy that effectively resolves interdependent query operations. CrackSQL supports three translation modes and offers multiple deployment and access options including a web console interface, a PyPI package, and a command-line prompt, facilitating adoption across a variety of real-world use cases
