Table of Contents
Fetching ...

Language-Integrated Recursive Queries

Anna Herlihy, Amir Shaikhha, Anastasia Ailamaki, Martin Odersky

TL;DR

The paper tackles the safety and portability challenges of SQL recursive queries by introducing TyQL, a Scala-based language-integrated DSL that derives compile-time properties to ensure safe fixed-point computations. It defines six independent properties (range-restriction, monotonicity, mutual-recursion, linearity, set-semantics, and constructor-freedom) and enforces them via a type-directed calculus, generating a single, backend-tailored SQL query for safe execution. TyQL’s implementation leverages Named-Tuples, type-level ASTs, and host-language embedding to deliver safe recursion with no runtime penalties compared to raw SQL and substantial speedups over non-recursive approaches. Evaluation on a Recursive Query Benchmark across multiple RDBMS demonstrates increased safety, predictable termination, and competitive performance, underscoring TyQL’s practical impact for industrial fixed-point analyses.

Abstract

Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE keyword in SQL:1999 extended the ability of relational database systems to handle fixed-point computations, unlocking significant performance advantages by allowing computation to move closer to the data. Yet with recursion, SQL becomes a Turing-complete programming language and, with that, unrecoverable safety and correctness risks. SQL itself lacks a fixed semantics, as the SQL specification is written in natural language, full of ambiguities that database vendors resolve in divergent ways. As a result, reasoning about the correctness of recursive SQL programs must rely on isolated mathematical properties of queries rather than wrestling a unified formal model out of a language with notoriously inconsistent semantics. To address these challenges, we propose a calculus that automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors - database errors, incorrect results, and non-termination. We introduce TyQL, a practical implementation in Scala for safe, recursive language-integrated query. Using Named-Tuples and type-level pattern matching, TyQL ensures query portability and safety, showing no performance penalty compared to raw SQL strings while unlocking a three-orders-of-magnitude speedup over non-recursive SQL queries.

Language-Integrated Recursive Queries

TL;DR

The paper tackles the safety and portability challenges of SQL recursive queries by introducing TyQL, a Scala-based language-integrated DSL that derives compile-time properties to ensure safe fixed-point computations. It defines six independent properties (range-restriction, monotonicity, mutual-recursion, linearity, set-semantics, and constructor-freedom) and enforces them via a type-directed calculus, generating a single, backend-tailored SQL query for safe execution. TyQL’s implementation leverages Named-Tuples, type-level ASTs, and host-language embedding to deliver safe recursion with no runtime penalties compared to raw SQL and substantial speedups over non-recursive approaches. Evaluation on a Recursive Query Benchmark across multiple RDBMS demonstrates increased safety, predictable termination, and competitive performance, underscoring TyQL’s practical impact for industrial fixed-point analyses.

Abstract

Performance-critical industrial applications, including large-scale program, network, and distributed system analyses, rely on fixed-point computations. The introduction of recursive common table expressions (CTEs) using the WITH RECURSIVE keyword in SQL:1999 extended the ability of relational database systems to handle fixed-point computations, unlocking significant performance advantages by allowing computation to move closer to the data. Yet with recursion, SQL becomes a Turing-complete programming language and, with that, unrecoverable safety and correctness risks. SQL itself lacks a fixed semantics, as the SQL specification is written in natural language, full of ambiguities that database vendors resolve in divergent ways. As a result, reasoning about the correctness of recursive SQL programs must rely on isolated mathematical properties of queries rather than wrestling a unified formal model out of a language with notoriously inconsistent semantics. To address these challenges, we propose a calculus that automatically derives mathematical properties from embedded recursive queries and, depending on the database backend, rejects queries that may lead to the three classes of recursive query errors - database errors, incorrect results, and non-termination. We introduce TyQL, a practical implementation in Scala for safe, recursive language-integrated query. Using Named-Tuples and type-level pattern matching, TyQL ensures query portability and safety, showing no performance penalty compared to raw SQL strings while unlocking a three-orders-of-magnitude speedup over non-recursive SQL queries.

Paper Structure

This paper contains 48 sections, 7 theorems, 41 equations, 39 figures, 5 tables.

Key Result

Theorem 1

Well-typed fully-restricted programs will always find the unique and minimal fixed-point under the iterated fixed-point semantics.

Figures (39)

  • Figure 1: Overview of compile-time safety enforcement for recursive queries. Queries Q1-Q3 that violate properties P1-P6 may lead to behaviors B1-B3. TyQL rejects unsafe queries at compile-time.
  • Figure 2: Recursive query for the transitive closure in TyQL and SQL
  • Figure 3: Non-monotonic query: Bill-of-materials (Q1)
  • Figure 4: Non-linear query: Transitive closure (Q2)
  • Figure 5: Bag-semantic query : Same-Generation (Q3)
  • ...and 34 more figures

Theorems & Definitions (23)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Theorem 1
  • Definition 7: Translation from normalized to , i.e., the toDL function.
  • Lemma 1: $\textbf{to-NRLSD}{}_\Psi$ produces well-formed
  • Theorem 2: $\llbracket\cdot\rrbracket_\Psi$ produces well-formed
  • ...and 13 more