Table of Contents
Fetching ...

Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search

Pinhan Zhao, Yuepeng Wang, Xinyu Wang

TL;DR

Polygon introduces a novel symbolic reasoning engine for SQL that avoids full, expensive SMT encodings by using a compositional under-approximation (UA) framework. It defines a lattice of operator-specific UAs, encodes their semantics with SMT, and performs a conflict-driven search to find a satisfying UA map that yields an input $I$ meeting the application condition $C$. The approach is proven sound and complete within its bounded semantics, and empirically outperforms state-of-the-art solvers on SQL equivalence refutation and query disambiguation across large benchmark sets. The method significantly accelerates input generation for complex SQL queries, enabling scalable verification and synthesis tasks with practical impact for education, data management, and database tooling.

Abstract

We present a novel symbolic reasoning engine for SQL which can efficiently generate an input $I$ for $n$ queries $P_1, \cdots, P_n$, such that their outputs on $I$ satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each $P_i$ -- that is, a subset of $P_i$'s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.

Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search

TL;DR

Polygon introduces a novel symbolic reasoning engine for SQL that avoids full, expensive SMT encodings by using a compositional under-approximation (UA) framework. It defines a lattice of operator-specific UAs, encodes their semantics with SMT, and performs a conflict-driven search to find a satisfying UA map that yields an input meeting the application condition . The approach is proven sound and complete within its bounded semantics, and empirically outperforms state-of-the-art solvers on SQL equivalence refutation and query disambiguation across large benchmark sets. The method significantly accelerates input generation for complex SQL queries, enabling scalable verification and synthesis tasks with practical impact for education, data management, and database tooling.

Abstract

We present a novel symbolic reasoning engine for SQL which can efficiently generate an input for queries , such that their outputs on satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each -- that is, a subset of 's input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.

Paper Structure

This paper contains 21 sections, 11 theorems, 22 equations, 27 figures, 1 table, 3 algorithms.

Key Result

theorem 1

Suppose $\textit{EncUASemantics}(F, u)$ yields an SMT formula $\varphi$, for query operator $F$ and UA $u \in \mathcal{U}_{F}$. For any model $\sigma$ of $\varphi$, the corresponding inputs $\sigma(\vec{x})$ and output $\sigma(y)$ are consistent with the precise semantics of $F$; that is, $\llbracke

Figures (27)

  • Figure 1: SQL query $P$ (left) and its corresponding AST (right). While $P$ is written in standard SQL syntax, we express its AST following our grammar in Figure \ref{['fig:sql-syntax']}. In particular, an AST node is labeled with a query operator, with non-query parameters (such as column lists and predicates) being part of the label. $\text{Project}$ in the AST means SELECT, $\text{LJoin}$ is LEFT JOIN, $L_1$ corresponds to "invoice_id, T.customer_name, cnt" on the left, etc. Each AST node is annotated with a unique id $v_i$. We also annotate some parts of $P$ to illustrate this mapping.
  • Figure 2: SQL query $P'$ (left) and its corresponding AST (right), following the same protocol as in Figure \ref{['fig:example:correct-query']}.
  • Figure 4: An example UA map $M$, which can be used to generate the counterexample input in Figure \ref{['fig:example:cex']} to refute the equivalence of $P$ (in Figure \ref{['fig:example:correct-query']}) and $P'$ (in Figure \ref{['fig:example:wrong-query']}).
  • Figure 5: An example sequence of UA maps $M_1 \rightarrow M_2 \rightarrow M_3 \rightarrow M$ that our algorithm may explore, to refute the equivalence of $P$ and $P'$. Here, $M$ is the satisfying UA map in Figure \ref{['fig:ex:M']}.
  • Figure 6: An example partition of the UA space for AST nodes $v'_1, v'_2, v'_3, v'_4, v_1, v_2$ from $P$ and $P'$.
  • ...and 22 more figures

Theorems & Definitions (12)

  • definition 1
  • theorem 1: Correctness of UA semantics
  • theorem 2: Refinement of UA Semantics
  • theorem 3: Soundness
  • theorem 4: Completeness
  • corollary 1: Correctness of UA semantics for queries
  • corollary 2: Refinement of UA semantics for queries
  • lemma 1: Soundness of Conflict-Driven UA Search
  • lemma 2
  • lemma 3
  • ...and 2 more