Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL
Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
TL;DR
This work tackles the limited diversity and high labeling cost of demonstrations in text-to-SQL using in-context learning. It defines Diversity Measurement (DM) to quantify demonstration pool diversity and proposes Fused, a human-free, iterative synthesis method that samples demonstrations from clusters and fuses them with LLMs to produce highly diverse demonstrations. Empirical results on Spider and KaggleDBQA show average improvements of $3.2\%$ with existing labeling and $5.0\%$ without labeling, validating both the DM metric and the effectiveness of Fused. The approach demonstrates potential for reducing labeling overhead while enhancing cross-domain adaptability of LLM-driven text-to-SQL systems. The work also provides insights into how diversity, iteration, and synthesis scale impact performance, highlighting practical guidance for deploying diverse demonstrations in real-world scenarios.
Abstract
Currently, the in-context learning method based on large language models (LLMs) has become the mainstream of text-to-SQL research. Previous works have discussed how to select demonstrations related to the user question from a human-labeled demonstration pool. However, human labeling suffers from the limitations of insufficient diversity and high labeling overhead. Therefore, in this paper, we discuss how to measure and improve the diversity of the demonstrations for text-to-SQL. We present a metric to measure the diversity of the demonstrations and analyze the insufficient of the existing labeled data by experiments. Based on the above discovery, we propose fusing iteratively for demonstrations (Fused) to build a high-diversity demonstration pool through human-free multiple-iteration synthesis, improving diversity and lowering label cost. Our method achieves an average improvement of 3.2% and 5.0% with and without human labeling on several mainstream datasets, which proves the effectiveness of Fused.
