Table of Contents
Fetching ...

Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning

Yusuf Denizay Dönder, Derek Hommel, Andrea W Wen-Yi, David Mimno, Unso Eun Seo Jo

TL;DR

This work addresses the high inference cost of state-of-the-art text-to-SQL methods that rely on Chain-of-Thought reasoning or fine-tuning. It introduces N-rep, a three-stage framework (Schema Linker, Candidate Generator, Candidate Selector) that uses multiple schema representations to generate diverse candidates and a confidence-aware selector to choose the final SQL, avoiding reasoning and tuning. Through experiments on the BIRD and SPIDER benchmarks, N-rep achieves competitive Execution Accuracy at a fraction of the cost, outperforming several CoT-based and fine-tuned baselines in efficiency and robustness. The approach demonstrates that domain-specific schema representation strategies can dramatically reduce inference cost while maintaining high-quality SQL generation, with open-source release planned on acceptance.

Abstract

LLMs are effective at code generation tasks like text-to-SQL, but is it worth the cost? Many state-of-the-art approaches use non-task-specific LLM techniques including Chain-of-Thought (CoT), self-consistency, and fine-tuning. These methods can be costly at inference time, sometimes requiring over a hundred LLM calls with reasoning, incurring average costs of up to \$0.46 per query, while fine-tuning models can cost thousands of dollars. We introduce "N-rep" consistency, a more cost-efficient text-to-SQL approach that achieves similar BIRD benchmark scores as other more expensive methods, at only \$0.039 per query. N-rep leverages multiple representations of the same schema input to mitigate weaknesses in any single representation, making the solution more robust and allowing the use of smaller and cheaper models without any reasoning or fine-tuning. To our knowledge, N-rep is the best-performing text-to-SQL approach in its cost range.

Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning

TL;DR

This work addresses the high inference cost of state-of-the-art text-to-SQL methods that rely on Chain-of-Thought reasoning or fine-tuning. It introduces N-rep, a three-stage framework (Schema Linker, Candidate Generator, Candidate Selector) that uses multiple schema representations to generate diverse candidates and a confidence-aware selector to choose the final SQL, avoiding reasoning and tuning. Through experiments on the BIRD and SPIDER benchmarks, N-rep achieves competitive Execution Accuracy at a fraction of the cost, outperforming several CoT-based and fine-tuned baselines in efficiency and robustness. The approach demonstrates that domain-specific schema representation strategies can dramatically reduce inference cost while maintaining high-quality SQL generation, with open-source release planned on acceptance.

Abstract

LLMs are effective at code generation tasks like text-to-SQL, but is it worth the cost? Many state-of-the-art approaches use non-task-specific LLM techniques including Chain-of-Thought (CoT), self-consistency, and fine-tuning. These methods can be costly at inference time, sometimes requiring over a hundred LLM calls with reasoning, incurring average costs of up to \0.039 per query. N-rep leverages multiple representations of the same schema input to mitigate weaknesses in any single representation, making the solution more robust and allowing the use of smaller and cheaper models without any reasoning or fine-tuning. To our knowledge, N-rep is the best-performing text-to-SQL approach in its cost range.

Paper Structure

This paper contains 33 sections, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Comparison of Execution Accuracy (EX) and average cost (cents) per query for different models on the BIRD benchmark dev set. Shape sizes reflect the logarithmic scale of the number of LLM calls.
  • Figure 2: Overview of the N-rep approach for Text-to-SQL generation.
  • Figure 3: Comparison of N-rep, Self Consistency with CoT and Self Consistency without CoT. $t=1$ means sampling temperature is 1.
  • Figure 4: Upper and lower bounds of N-rep, Self Consistency with CoT and Self Consistency without CoT.
  • Figure 5: EX by vote count for the selected candidate
  • ...and 11 more figures