SQL-to-Text Generation with Weighted-AST Few-Shot Prompting
Sriom Chakrabarti, Chuangtao Ma, Arijit Khan, Sebastian Link
TL;DR
The paper tackles the problem of faithful SQL-to-Text translation by introducing Weighted-AST retrieval to select semantically relevant few-shot demonstrations and a prompting scheme that leverages Chain-of-Thought to ensure semantic fidelity. It combines surface and AST-based features with IDF and query-attention weighting to compute similarities and retrieve top-$k$ examples, which are integrated into a structured prompt with four-step reasoning. Experiments on Spider-S2T, SParC-S2T, and CoSQL-S2T across Mistral-7B, Code Llama-7B, and GPT-J-6B show consistent improvements in execution accuracy and exact match, as well as strong human-evaluated semantic fidelity and robust round-trip evaluation. The results demonstrate that structure-aware, few-shot prompting is scalable and effective for translating complex SQL queries into natural language descriptions, with potential for multilingual extension and larger datasets.
Abstract
SQL-to-Text generation aims at translating structured SQL queries into natural language descriptions, thereby facilitating comprehension of complex database operations for non-technical users. Although large language models (LLMs) have recently demonstrated promising results, current methods often fail to maintain the exact semantics of SQL queries, particularly when there are multiple possible correct phrasings. To address this problem, our work proposes Weighted-AST retrieval with prompting, an architecture that integrates structural query representations and LLM prompting. This method retrieves semantically relevant examples as few-shot prompts using a similarity metric based on an Abstract Syntax Tree (AST) with learned feature weights. Our structure-aware prompting technique ensures that generated descriptions are both fluent and faithful to the original query logic. Numerous experiments on three benchmark datasets - Spider, SParC, and CoSQL show that our method outperforms the current baselines by up to +17.24% in execution Accuracy (EX), performs superior in Exact Match (EM) and provides more consistent semantic fidelity when evaluated by humans, all while preserving competitive runtime performance. These results demonstrate that Weighted-AST prompting is a scalable and effective method for deriving natural language explanations from structured database queries.
