Table of Contents
Fetching ...

Efficient Evaluation of Arbitrary Relational Calculus Queries

Martin Raszyk, David Basin, Srđan Krstić, Dmitriy Traytel

TL;DR

This work tackles the inefficiency of evaluating arbitrary relational calculus (RC) queries by introducing a translation that maps any RC query to two safe-range queries under an infinite-domain assumption. The first query characterizes whether the original query is relatively safe (finite on a given database) and the second yields the original result when finite, enabling finite enumeration via standard relational algebra and SQL operations. By chaining translations through safe-range normal form (SRNF), relational algebra normal form (RANF), and relational algebra (RA) before generating SQL, the authors build RC2SQL, a practical tool that outperforms prior approaches in both theory and practice. Empirical results on synthetic Data Golf benchmarks and real Amazon data show improved asymptotic and observed performance, validating the feasibility of RC as a practical query language for DBMSs. The approach also introduces optimizations like count aggregations and uses training data to guide nondeterministic choices, contributing to scalable evaluation of domain-dependent RC queries.

Abstract

The relational calculus (RC) is a concise, declarative query language. However, existing RC query evaluation approaches are inefficient and often deviate from established algorithms based on finite tables used in database management systems. We devise a new translation of an arbitrary RC query into two safe-range queries, for which the finiteness of the query's evaluation result is guaranteed. Assuming an infinite domain, the two queries have the following meaning: The first is closed and characterizes the original query's relative safety, i.e., whether given a fixed database, the original query evaluates to a finite relation. The second safe-range query is equivalent to the original query, if the latter is relatively safe. We compose our translation with other, more standard ones to ultimately obtain two SQL queries. This allows us to use standard database management systems to evaluate arbitrary RC queries. We show that our translation improves the time complexity over existing approaches, which we also empirically confirm in both realistic and synthetic experiments.

Efficient Evaluation of Arbitrary Relational Calculus Queries

TL;DR

This work tackles the inefficiency of evaluating arbitrary relational calculus (RC) queries by introducing a translation that maps any RC query to two safe-range queries under an infinite-domain assumption. The first query characterizes whether the original query is relatively safe (finite on a given database) and the second yields the original result when finite, enabling finite enumeration via standard relational algebra and SQL operations. By chaining translations through safe-range normal form (SRNF), relational algebra normal form (RANF), and relational algebra (RA) before generating SQL, the authors build RC2SQL, a practical tool that outperforms prior approaches in both theory and practice. Empirical results on synthetic Data Golf benchmarks and real Amazon data show improved asymptotic and observed performance, validating the feasibility of RC as a practical query language for DBMSs. The approach also introduces optimizations like count aggregations and uses training data to guide nondeterministic choices, contributing to scalable evaluation of domain-dependent RC queries.

Abstract

The relational calculus (RC) is a concise, declarative query language. However, existing RC query evaluation approaches are inefficient and often deviate from established algorithms based on finite tables used in database management systems. We devise a new translation of an arbitrary RC query into two safe-range queries, for which the finiteness of the query's evaluation result is guaranteed. Assuming an infinite domain, the two queries have the following meaning: The first is closed and characterizes the original query's relative safety, i.e., whether given a fixed database, the original query evaluates to a finite relation. The second safe-range query is equivalent to the original query, if the latter is relatively safe. We compose our translation with other, more standard ones to ultimately obtain two SQL queries. This allows us to use standard database management systems to evaluate arbitrary RC queries. We show that our translation improves the time complexity over existing approaches, which we also empirically confirm in both realistic and synthetic experiments.
Paper Structure (30 sections, 19 theorems, 42 equations, 15 figures, 1 table, 6 algorithms)

This paper contains 30 sections, 19 theorems, 42 equations, 15 figures, 1 table, 6 algorithms.

Key Result

Lemma 3.1

Let $Q$ be a query, $x \in \mathsf{fv}(Q)$, and $\mathcal{G}$ be a set of quantified predicates such that $\mathsf{gen}({x},{Q},{\mathcal{G}})$. Then (i) for every $Q_{\mathit{qp}}\in \mathcal{G}$, we have $x\in\mathsf{fv}(Q_{\mathit{qp}})$ and $\mathsf{fv}(Q_{\mathit{qp}})\subseteq\mathsf{fv}(Q)$,

Figures (15)

  • Figure 1: Overview of our translation.
  • Figure 2: The semantics of RC.
  • Figure 3: Constant propagation rules.
  • Figure 4: The generated relation.
  • Figure 5: The covered relation.
  • ...and 10 more figures

Theorems & Definitions (46)

  • Definition 1
  • Definition 2
  • Lemma 3.1
  • Definition 3
  • Example 1
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Example 2
  • Lemma 4.4
  • ...and 36 more