Table of Contents
Fetching ...

Verifying SQL Queries using Theories of Tables and Relations

Mudathir Mohamed, Andrew Reynolds, Cesare Tinelli, Clark Barrett

TL;DR

This work develops an SMT-based framework for verifying SQL queries by encoding them into theories of tables (bag semantics) and finite relations (set semantics), augmented with a theory for nullable sorts to model NULL values. It introduces a tableaux-style calculus and a subsolver integrated into the cvc5 SMT solver to handle join, projection, selection, map, and filter operations, including product and inner join. The authors prove correctness properties for the calculi and characterize termination conditions, notably showing decidability in restricted fragments and providing an acyclicity-based termination condition for the relational theory with maps. An empirical evaluation on Calcite-derived benchmarks demonstrates competitive performance and highlights strengths and limitations relative to SQLSolver and SPES, with insights into future work such as aggregation, ordering, and broader SQL feature coverage. Overall, the approach enables automated, extensible SQL analysis beyond equivalence, with practical potential for query containment and emptiness checking within a unified SMT-based framework.

Abstract

We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite relations. Furthermore, we add the ability to reason about tables with null values by introducing a theory of nullable sorts based on an extension of the theory of algebraic datatypes. We implemented solvers for these theories in the SMT solver cvc5 and evaluated them on a set of benchmarks derived from public sets of SQL equivalence problems.

Verifying SQL Queries using Theories of Tables and Relations

TL;DR

This work develops an SMT-based framework for verifying SQL queries by encoding them into theories of tables (bag semantics) and finite relations (set semantics), augmented with a theory for nullable sorts to model NULL values. It introduces a tableaux-style calculus and a subsolver integrated into the cvc5 SMT solver to handle join, projection, selection, map, and filter operations, including product and inner join. The authors prove correctness properties for the calculi and characterize termination conditions, notably showing decidability in restricted fragments and providing an acyclicity-based termination condition for the relational theory with maps. An empirical evaluation on Calcite-derived benchmarks demonstrates competitive performance and highlights strengths and limitations relative to SQLSolver and SPES, with insights into future work such as aggregation, ordering, and broader SQL feature coverage. Overall, the approach enables automated, extensible SQL analysis beyond equivalence, with practical potential for query containment and emptiness checking within a unified SMT-based framework.

Abstract

We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite relations. Furthermore, we add the ability to reason about tables with null values by introducing a theory of nullable sorts based on an extension of the theory of algebraic datatypes. We implemented solvers for these theories in the SMT solver cvc5 and evaluated them on a set of benchmarks derived from public sets of SQL equivalence problems.
Paper Structure (25 sections, 15 theorems, 12 equations, 15 figures)

This paper contains 25 sections, 15 theorems, 12 equations, 15 figures.

Key Result

Lemma 2.0

For every quantifier-free $\Sigma_\mathsf{Tab}$-formula $\varphi$, there are sets $B_1,\ldots,B_n$ of table constraints, sets $A_1,\ldots,A_n$ of arithmetic constraints, and sets $E_1,\ldots,E_n$ of elements constraints such that $\varphi$ is satisfiable in $\mathbb{T}_\mathsf{Tab}$ iff $A_i \cup B_

Figures (15)

  • Figure 1: Signature $\Sigma_\mathsf{Tab}$ for the theory of tables. Here $\mathsf{Table}\xspace(\bm{\alpha}, \bm{\beta})$ is a shorthand for $\mathsf{Table}\xspace(\alpha_0, \ldots, \alpha_p, \beta_0, \ldots, \beta_q)$ when $\bm{\alpha} = \alpha_0, \ldots, \alpha_p$ and $\bm{\beta} = \beta_0, \ldots, \beta_q$.
  • Figure 2: Simplification rules for $\Sigma_\mathsf{Tab}$-terms. In the last two rules, $\varepsilon\xspace$ is the sort of $e$ and of the elements of $s$, respectively; $n$ is a numeral.
  • Figure 3: Bag rules.
  • Figure 4: Table rules. The syntax $\bar{x}_m$ abbreviates $x_0, \dots, x_m$.
  • Figure 5: Bag filter and map rules. $w,i$ are fresh variables.
  • ...and 10 more figures

Theorems & Definitions (32)

  • Definition 2.1
  • Lemma 2.0
  • Example 2.1
  • Example 2.2
  • Lemma 2.0
  • Proposition 2.0: Refutation Soundness
  • Example 2.3
  • Proposition 2.0: Solution Soundness
  • Proposition 2.0: Termination
  • Proposition 3.0
  • ...and 22 more