Verifying SQL Queries using Theories of Tables and Relations
Mudathir Mohamed, Andrew Reynolds, Cesare Tinelli, Clark Barrett
TL;DR
This work develops an SMT-based framework for verifying SQL queries by encoding them into theories of tables (bag semantics) and finite relations (set semantics), augmented with a theory for nullable sorts to model NULL values. It introduces a tableaux-style calculus and a subsolver integrated into the cvc5 SMT solver to handle join, projection, selection, map, and filter operations, including product and inner join. The authors prove correctness properties for the calculi and characterize termination conditions, notably showing decidability in restricted fragments and providing an acyclicity-based termination condition for the relational theory with maps. An empirical evaluation on Calcite-derived benchmarks demonstrates competitive performance and highlights strengths and limitations relative to SQLSolver and SPES, with insights into future work such as aggregation, ordering, and broader SQL feature coverage. Overall, the approach enables automated, extensible SQL analysis beyond equivalence, with practical potential for query containment and emptiness checking within a unified SMT-based framework.
Abstract
We present a number of first- and second-order extensions to SMT theories specifically aimed at representing and analyzing SQL queries with join, projection, and selection operations. We support reasoning about SQL queries with either bag or set semantics for database tables. We provide the former via an extension of a theory of finite bags and the latter via an extension of the theory of finite relations. Furthermore, we add the ability to reason about tables with null values by introducing a theory of nullable sorts based on an extension of the theory of algebraic datatypes. We implemented solvers for these theories in the SMT solver cvc5 and evaluated them on a set of benchmarks derived from public sets of SQL equivalence problems.
