Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search
Pinhan Zhao, Yuepeng Wang, Xinyu Wang
TL;DR
Polygon introduces a novel symbolic reasoning engine for SQL that avoids full, expensive SMT encodings by using a compositional under-approximation (UA) framework. It defines a lattice of operator-specific UAs, encodes their semantics with SMT, and performs a conflict-driven search to find a satisfying UA map that yields an input $I$ meeting the application condition $C$. The approach is proven sound and complete within its bounded semantics, and empirically outperforms state-of-the-art solvers on SQL equivalence refutation and query disambiguation across large benchmark sets. The method significantly accelerates input generation for complex SQL queries, enabling scalable verification and synthesis tasks with practical impact for education, data management, and database tooling.
Abstract
We present a novel symbolic reasoning engine for SQL which can efficiently generate an input $I$ for $n$ queries $P_1, \cdots, P_n$, such that their outputs on $I$ satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each $P_i$ -- that is, a subset of $P_i$'s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.
