Table of Contents
Fetching ...

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Rocky Klopfenstein, Yang He, Andrew Tremante, Yuepeng Wang, Nina Narodytska, Haoze Wu

TL;DR

A constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation is introduced that enables SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.

Abstract

We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

TL;DR

A constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation is introduced that enables SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.

Abstract

We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.
Paper Structure (21 sections, 7 equations, 5 figures, 3 tables)

This paper contains 21 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The generated and gold queries for a question in the BIRD dataset.
  • Figure 2: The workflow of .
  • Figure 3: Generated Query $P$, Gold Query $Q$, and counterexample with bound $K=2$ for NL Question: "How much is the average build up play speed of the Heart of Midlothian team?".
  • Figure 4: Generated Query $P$, Gold Query $Q$, and counterexample with bound $K=2$ for NL Question: "What is the most common bond type?".
  • Figure 5: Generated Query $P$, Gold Query $Q$, , , and counterexamples with bound $K=2$ for NL Question: "List the last name of members with a major in environmental engineering and include its department and college name."