Table of Contents
Fetching ...

Output-sensitive Conjunctive Query Evaluation

Shaleen Deep, Hangdong Zhao, Austen Z. Fan, Paraschos Koutris

TL;DR

This paper presents a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables and shows that it is possible to improve the running time guarantee of Yannakakis algorithm by a polynomial factor.

Abstract

Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the results have catered to either full join queries or non-full join queries but with degree constraints (such as PK-FK relationships) that make joins \emph{easier} to evaluate. Further, most of the algorithms are also not output-sensitive. In this paper, we present a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables. Our result is based on a novel generalization of the Yannakakis algorithm and shows that it is possible to improve the running time guarantee of the Yannakakis algorithm by a polynomial factor. Importantly, our algorithmic improvement does not depend on the use of fast matrix multiplication, as a recently proposed algorithm does. The upper bound is complemented with matching lower bounds conditioned on two variants of the $k$-clique conjecture. The application of our algorithm recovers known prior results and improves on known state-of-the-art results for common queries such as paths and stars.

Output-sensitive Conjunctive Query Evaluation

TL;DR

This paper presents a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables and shows that it is possible to improve the running time guarantee of Yannakakis algorithm by a polynomial factor.

Abstract

Join evaluation is one of the most fundamental operations performed by database systems and arguably the most well-studied problem in the Database community. A staggering number of join algorithms have been developed, and commercial database engines use finely tuned join heuristics that take into account many factors including the selectivity of predicates, memory, IO, etc. However, most of the results have catered to either full join queries or non-full join queries but with degree constraints (such as PK-FK relationships) that make joins \emph{easier} to evaluate. Further, most of the algorithms are also not output-sensitive. In this paper, we present a novel, output-sensitive algorithm for the evaluation of acyclic Conjunctive Queries (CQs) that contain arbitrary free variables. Our result is based on a novel generalization of the Yannakakis algorithm and shows that it is possible to improve the running time guarantee of the Yannakakis algorithm by a polynomial factor. Importantly, our algorithmic improvement does not depend on the use of fast matrix multiplication, as a recently proposed algorithm does. The upper bound is complemented with matching lower bounds conditioned on two variants of the -clique conjecture. The application of our algorithm recovers known prior results and improves on known state-of-the-art results for common queries such as paths and stars.
Paper Structure (15 sections, 9 theorems, 8 equations, 4 figures, 4 algorithms)

This paper contains 15 sections, 9 theorems, 8 equations, 4 figures, 4 algorithms.

Key Result

proposition 1

Let $Q$ be a reduced acyclic CQ, and $\mathcal{T}$ be a join tree of $Q$. Then, no variable in any node of $\mathcal{T}$ can be isolated and non-free.

Figures (4)

  • Figure 1: Database instance $\mathcal{D}$ showing relational instances for relations $R_{12}(x_1, x_2), R_{23}(x_2, x_3),$ and $R_{34}(x_3, x_4)$ for the three path query.
  • Figure 2: Depiction of the graph $G^\exists_Q$ and the decomposition of the running example query $Q(\mathbf{x}_{14567}) \leftarrow R_{12}(\mathbf{x}_{12}) \wedge R_{23}(\mathbf{x}_{23}) \wedge R_{34}(\mathbf{x}_{34}) \wedge R_{25}(\mathbf{x}_{25}) \wedge R_{46}(\mathbf{x}_{46}) \wedge R_{57}(\mathbf{x}_{57})$
  • Figure 4: Evaluating the running example query using \ref{['alg:our']}. Each figure shows a rooted join tree.
  • Figure 5: Relations formed by variables arranged as a complete binary tree. Every root-to-leaf path forms a relation (labeled).

Theorems & Definitions (13)

  • proposition 1
  • definition 1: Projection Width
  • lemma 1
  • theorem 1
  • theorem 2
  • theorem 3
  • theorem 4
  • theorem 5
  • definition 2: Boolean $k$-Clique Conjecture
  • definition 3: Min-Weight $k$-Clique Conjecture
  • ...and 3 more