Efficient Evaluation of Arbitrary Relational Calculus Queries
Martin Raszyk, David Basin, Srđan Krstić, Dmitriy Traytel
TL;DR
This work tackles the inefficiency of evaluating arbitrary relational calculus (RC) queries by introducing a translation that maps any RC query to two safe-range queries under an infinite-domain assumption. The first query characterizes whether the original query is relatively safe (finite on a given database) and the second yields the original result when finite, enabling finite enumeration via standard relational algebra and SQL operations. By chaining translations through safe-range normal form (SRNF), relational algebra normal form (RANF), and relational algebra (RA) before generating SQL, the authors build RC2SQL, a practical tool that outperforms prior approaches in both theory and practice. Empirical results on synthetic Data Golf benchmarks and real Amazon data show improved asymptotic and observed performance, validating the feasibility of RC as a practical query language for DBMSs. The approach also introduces optimizations like count aggregations and uses training data to guide nondeterministic choices, contributing to scalable evaluation of domain-dependent RC queries.
Abstract
The relational calculus (RC) is a concise, declarative query language. However, existing RC query evaluation approaches are inefficient and often deviate from established algorithms based on finite tables used in database management systems. We devise a new translation of an arbitrary RC query into two safe-range queries, for which the finiteness of the query's evaluation result is guaranteed. Assuming an infinite domain, the two queries have the following meaning: The first is closed and characterizes the original query's relative safety, i.e., whether given a fixed database, the original query evaluates to a finite relation. The second safe-range query is equivalent to the original query, if the latter is relatively safe. We compose our translation with other, more standard ones to ultimately obtain two SQL queries. This allows us to use standard database management systems to evaluate arbitrary RC queries. We show that our translation improves the time complexity over existing approaches, which we also empirically confirm in both realistic and synthetic experiments.
