SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Rocky Klopfenstein; Yang He; Andrew Tremante; Yuepeng Wang; Nina Narodytska; Haoze Wu

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Rocky Klopfenstein, Yang He, Andrew Tremante, Yuepeng Wang, Nina Narodytska, Haoze Wu

TL;DR

A constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation is introduced that enables SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.

Abstract

We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

TL;DR

Abstract

Paper Structure (21 sections, 7 equations, 5 figures, 3 tables)

This paper contains 21 sections, 7 equations, 5 figures, 3 tables.

Introduction
Motivating Example
: Bounded Equivalence Verification with Database Constraints
Constraint Types
Constraint Extraction Pipeline
Evaluation
Verification Results
Conclusion
Background
Specification Mining
Range Constraints.
Categorical Constraints.
Null Constraints.
...and 6 more sections

Figures (5)

Figure 1: The generated and gold queries for a question in the BIRD dataset.
Figure 2: The workflow of .
Figure 3: Generated Query $P$, Gold Query $Q$, and counterexample with bound $K=2$ for NL Question: "How much is the average build up play speed of the Heart of Midlothian team?".
Figure 4: Generated Query $P$, Gold Query $Q$, and counterexample with bound $K=2$ for NL Question: "What is the most common bond type?".
Figure 5: Generated Query $P$, Gold Query $Q$, , , and counterexamples with bound $K=2$ for NL Question: "List the last name of members with a major in environmental engineering and include its department and college name."

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

TL;DR

Abstract

SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Authors

TL;DR

Abstract

Table of Contents

Figures (5)