EPI-SQL: Enhancing Text-to-SQL Translation with Error-Prevention Instructions
Xiping Liu, Zhao Tan
TL;DR
The paper tackles Text-to-SQL by introducing Error-Prevention Instructions (EPIs) that are derived from error-prone instances and tailored to the current task. It proposes EPI-SQL, a zero-shot framework that builds EPIs through error-prone instance collection, general EPIs, contextualized EPIs, and SQL generation with EPIs, and integrates them into the prompt without demonstrations. On Spider, the approach with GPT-4 reaches 85.1% execution accuracy and 77.9% test-suite accuracy, rivaling advanced few-shot methods and surpassing several zero-shot baselines. Ablation studies show that EPI-verification and question similarity are crucial to performance, while biases in data and schema influence error patterns. Overall, the work demonstrates that task-specific, contextualized instructions can substantially boost LLM-based NLP tasks and suggests avenues for broader application of instruction-based enhancements.
Abstract
The conversion of natural language queries into SQL queries, known as Text-to-SQL, is a critical yet challenging task. This paper introduces EPI-SQL, a novel methodological framework leveraging Large Language Models (LLMs) to enhance the performance of Text-to-SQL tasks. EPI-SQL operates through a four-step process. Initially, the method involves gathering instances from the Spider dataset on which LLMs are prone to failure. These instances are then utilized to generate general error-prevention instructions (EPIs). Subsequently, LLMs craft contextualized EPIs tailored to the specific context of the current task. Finally, these context-specific EPIs are incorporated into the prompt used for SQL generation. EPI-SQL is distinguished in that it provides task-specific guidance, enabling the model to circumvent potential errors for the task at hand. Notably, the methodology rivals the performance of advanced few-shot methods despite being a zero-shot approach. An empirical assessment using the Spider benchmark reveals that EPI-SQL achieves an execution accuracy of 85.1\%, underscoring its effectiveness in generating accurate SQL queries through LLMs. The findings indicate a promising direction for future research, i.e. enhancing instructions with task-specific and contextualized rules, for boosting LLMs' performance in NLP tasks.
