Table of Contents
Fetching ...

SQL-to-Text Generation with Weighted-AST Few-Shot Prompting

Sriom Chakrabarti, Chuangtao Ma, Arijit Khan, Sebastian Link

TL;DR

The paper tackles the problem of faithful SQL-to-Text translation by introducing Weighted-AST retrieval to select semantically relevant few-shot demonstrations and a prompting scheme that leverages Chain-of-Thought to ensure semantic fidelity. It combines surface and AST-based features with IDF and query-attention weighting to compute similarities and retrieve top-$k$ examples, which are integrated into a structured prompt with four-step reasoning. Experiments on Spider-S2T, SParC-S2T, and CoSQL-S2T across Mistral-7B, Code Llama-7B, and GPT-J-6B show consistent improvements in execution accuracy and exact match, as well as strong human-evaluated semantic fidelity and robust round-trip evaluation. The results demonstrate that structure-aware, few-shot prompting is scalable and effective for translating complex SQL queries into natural language descriptions, with potential for multilingual extension and larger datasets.

Abstract

SQL-to-Text generation aims at translating structured SQL queries into natural language descriptions, thereby facilitating comprehension of complex database operations for non-technical users. Although large language models (LLMs) have recently demonstrated promising results, current methods often fail to maintain the exact semantics of SQL queries, particularly when there are multiple possible correct phrasings. To address this problem, our work proposes Weighted-AST retrieval with prompting, an architecture that integrates structural query representations and LLM prompting. This method retrieves semantically relevant examples as few-shot prompts using a similarity metric based on an Abstract Syntax Tree (AST) with learned feature weights. Our structure-aware prompting technique ensures that generated descriptions are both fluent and faithful to the original query logic. Numerous experiments on three benchmark datasets - Spider, SParC, and CoSQL show that our method outperforms the current baselines by up to +17.24% in execution Accuracy (EX), performs superior in Exact Match (EM) and provides more consistent semantic fidelity when evaluated by humans, all while preserving competitive runtime performance. These results demonstrate that Weighted-AST prompting is a scalable and effective method for deriving natural language explanations from structured database queries.

SQL-to-Text Generation with Weighted-AST Few-Shot Prompting

TL;DR

The paper tackles the problem of faithful SQL-to-Text translation by introducing Weighted-AST retrieval to select semantically relevant few-shot demonstrations and a prompting scheme that leverages Chain-of-Thought to ensure semantic fidelity. It combines surface and AST-based features with IDF and query-attention weighting to compute similarities and retrieve top- examples, which are integrated into a structured prompt with four-step reasoning. Experiments on Spider-S2T, SParC-S2T, and CoSQL-S2T across Mistral-7B, Code Llama-7B, and GPT-J-6B show consistent improvements in execution accuracy and exact match, as well as strong human-evaluated semantic fidelity and robust round-trip evaluation. The results demonstrate that structure-aware, few-shot prompting is scalable and effective for translating complex SQL queries into natural language descriptions, with potential for multilingual extension and larger datasets.

Abstract

SQL-to-Text generation aims at translating structured SQL queries into natural language descriptions, thereby facilitating comprehension of complex database operations for non-technical users. Although large language models (LLMs) have recently demonstrated promising results, current methods often fail to maintain the exact semantics of SQL queries, particularly when there are multiple possible correct phrasings. To address this problem, our work proposes Weighted-AST retrieval with prompting, an architecture that integrates structural query representations and LLM prompting. This method retrieves semantically relevant examples as few-shot prompts using a similarity metric based on an Abstract Syntax Tree (AST) with learned feature weights. Our structure-aware prompting technique ensures that generated descriptions are both fluent and faithful to the original query logic. Numerous experiments on three benchmark datasets - Spider, SParC, and CoSQL show that our method outperforms the current baselines by up to +17.24% in execution Accuracy (EX), performs superior in Exact Match (EM) and provides more consistent semantic fidelity when evaluated by humans, all while preserving competitive runtime performance. These results demonstrate that Weighted-AST prompting is a scalable and effective method for deriving natural language explanations from structured database queries.

Paper Structure

This paper contains 24 sections, 3 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Overall architecture of the proposed SQL-to-Text generation framework. a) Preprocessing: The input SQL query undergoes a series of preprocessing steps to normalize its structure and simplify complex constructs. b) Weighted AST Retrieval with Few-Shot Prompting: Our main contribution introduces a Weighted Abstract Syntax Tree (AST) Retrieval mechanism that retrieves the most semantically relevant examples based on a weighted similarity score. These top-$k$ examples are seamlessly integrated into the LLM's prompt, allowing the model to better capture the intent of the SQL query and produce more precise and fluent translations. c) Generation: After preprocessing and retrieval, the structured prompt is passed to the LLM, which generates a high-quality natural language description of the SQL query, ensuring improved semantic fidelity and readability.
  • Figure 2: Prompt template for SQL-to-Text translation showing reasoning steps, quality checks, and few-shot demonstrations.