Table of Contents
Fetching ...

Effectiveness of Prompt Optimization in NL2SQL Systems

Sairam Gurajada, Eser Kandogan, Sajjadur Rahman

TL;DR

This paper argues that production-grade NL2SQL requires careful prompt design and exemplar selection beyond raw SQL accuracy. It introduces Iterative Prompt Optimization (IPO), a two-agent framework (Proposer and SQL Generator) that jointly optimizes instructions and exemplars while implicitly pruning schema to reduce prompt size. The authors extend to a multi-objective setting by incorporating query latency and present the BIRD-MULTI benchmark with execution-time data to study efficient SQL generation. Preliminary results show that IPO outperforms baseline exemplar strategies in accuracy and prompt efficiency, and that joint optimization can balance accuracy with execution latency, offering practical benefits for real-world NL2SQL deployments.

Abstract

NL2SQL approaches have greatly benefited from the impressive capabilities of large language models (LLMs). In particular, bootstrapping an NL2SQL system for a specific domain can be as simple as instructing an LLM with sufficient contextual information, such as schema details and translation demonstrations. However, building an accurate system still requires the rigorous task of selecting the right context for each query-including identifying relevant schema elements, cell values, and suitable exemplars that help the LLM understand domain-specific nuances. Retrieval-based methods have become the go-to approach for identifying such context. While effective, these methods introduce additional inference-time costs due to the retrieval process. In this paper, we argue that production scenarios demand high-precision, high-performance NL2SQL systems, rather than simply high-quality SQL generation, which is the focus of most current NL2SQL approaches. In such scenarios, the careful selection of a static set of exemplars-capturing the intricacies of the query log, target database, SQL constructs, and execution latencies-plays a more crucial role than exemplar selection based solely on similarity. The key challenge, however, lies in identifying a representative set of exemplars for a given production setting. To this end, we propose a prompt optimization framework that not only addresses the high-precision requirement but also optimizes the performance of the generated SQL through multi-objective optimization. Preliminary empirical analysis demonstrates the effectiveness of the proposed framework.

Effectiveness of Prompt Optimization in NL2SQL Systems

TL;DR

This paper argues that production-grade NL2SQL requires careful prompt design and exemplar selection beyond raw SQL accuracy. It introduces Iterative Prompt Optimization (IPO), a two-agent framework (Proposer and SQL Generator) that jointly optimizes instructions and exemplars while implicitly pruning schema to reduce prompt size. The authors extend to a multi-objective setting by incorporating query latency and present the BIRD-MULTI benchmark with execution-time data to study efficient SQL generation. Preliminary results show that IPO outperforms baseline exemplar strategies in accuracy and prompt efficiency, and that joint optimization can balance accuracy with execution latency, offering practical benefits for real-world NL2SQL deployments.

Abstract

NL2SQL approaches have greatly benefited from the impressive capabilities of large language models (LLMs). In particular, bootstrapping an NL2SQL system for a specific domain can be as simple as instructing an LLM with sufficient contextual information, such as schema details and translation demonstrations. However, building an accurate system still requires the rigorous task of selecting the right context for each query-including identifying relevant schema elements, cell values, and suitable exemplars that help the LLM understand domain-specific nuances. Retrieval-based methods have become the go-to approach for identifying such context. While effective, these methods introduce additional inference-time costs due to the retrieval process. In this paper, we argue that production scenarios demand high-precision, high-performance NL2SQL systems, rather than simply high-quality SQL generation, which is the focus of most current NL2SQL approaches. In such scenarios, the careful selection of a static set of exemplars-capturing the intricacies of the query log, target database, SQL constructs, and execution latencies-plays a more crucial role than exemplar selection based solely on similarity. The key challenge, however, lies in identifying a representative set of exemplars for a given production setting. To this end, we propose a prompt optimization framework that not only addresses the high-precision requirement but also optimizes the performance of the generated SQL through multi-objective optimization. Preliminary empirical analysis demonstrates the effectiveness of the proposed framework.

Paper Structure

This paper contains 12 sections, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: NL2SQL Pipeline
  • Figure 2: Iterative Prompt Optimization
  • Figure 3: IPO generated exemplar with automatic schema pruning