Table of Contents
Fetching ...

Ranked Enumeration for Database Queries

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

TL;DR

Ranked enumeration tackles returning query results in order of importance without materializing the entire output. The paper develops two main instantiations: lexicographic order and SUM-based ranking for acyclic join queries, achieving $\tilde{{\mathcal{O}}}(n + k)$ time-to-$k$ guarantees through semijoin preprocessing, join indexes, and careful top-down enumeration. It characterizes feasible lexicographic orders via disruptive trios and reverse alpha elimination orders, and extends to SUM with a bottom-up DP that computes optimal subtree weights, coupled with a priority-queue enumeration. Beyond these core cases, the work discusses general ranking functions (subset-monotone), CQs with projection, and non-acyclic cases via decompositions, supported by experiments showing practical speedups over traditional join-then-sort approaches.

Abstract

Ranked enumeration is a query-answering paradigm where the query answers are returned incrementally in order of importance (instead of returning all answers at once). Importance is defined by a ranking function that can be specific to the application, but typically involves either a lexicographic order (e.g., "ORDER BY R.A, S.B" in SQL) or a weighted sum of attributes (e.g., "ORDER BY 3*R.A + 2*S.B"). Recent work has introduced any-k algorithms for (multi-way) join queries, which push ranking into joins and avoid materializing intermediate results until necessary. The top-ranked answers are returned asymptotically faster than the common join-then-rank approach of database systems, resulting in orders-of-magnitude speedup in practice. In addition to their practical usefulness, these techniques complement a long line of theoretical research on unranked enumeration, where answers are also returned incrementally, but with no explicit ordering requirement. For a broad class of ranking functions with certain monotonicity properties, including lexicographic orders and sum-based rankings, the ordering requirement surprisingly does not increase the asymptotic time or space complexity, apart from logarithmic factors. A key insight is the connection between ranked enumeration for database queries and the fundamental task of computing the kth-shortest path in a graph. Although this connection is important for grounding the problem in the literature, it can obfuscate the simplicity of the algorithm. In this article, we adopt a pragmatic approach and present a slightly simplified version of the algorithm without the shortest-path interpretation. We believe that this will benefit practitioners looking to implement and optimize any-k approaches.

Ranked Enumeration for Database Queries

TL;DR

Ranked enumeration tackles returning query results in order of importance without materializing the entire output. The paper develops two main instantiations: lexicographic order and SUM-based ranking for acyclic join queries, achieving time-to- guarantees through semijoin preprocessing, join indexes, and careful top-down enumeration. It characterizes feasible lexicographic orders via disruptive trios and reverse alpha elimination orders, and extends to SUM with a bottom-up DP that computes optimal subtree weights, coupled with a priority-queue enumeration. Beyond these core cases, the work discusses general ranking functions (subset-monotone), CQs with projection, and non-acyclic cases via decompositions, supported by experiments showing practical speedups over traditional join-then-sort approaches.

Abstract

Ranked enumeration is a query-answering paradigm where the query answers are returned incrementally in order of importance (instead of returning all answers at once). Importance is defined by a ranking function that can be specific to the application, but typically involves either a lexicographic order (e.g., "ORDER BY R.A, S.B" in SQL) or a weighted sum of attributes (e.g., "ORDER BY 3*R.A + 2*S.B"). Recent work has introduced any-k algorithms for (multi-way) join queries, which push ranking into joins and avoid materializing intermediate results until necessary. The top-ranked answers are returned asymptotically faster than the common join-then-rank approach of database systems, resulting in orders-of-magnitude speedup in practice. In addition to their practical usefulness, these techniques complement a long line of theoretical research on unranked enumeration, where answers are also returned incrementally, but with no explicit ordering requirement. For a broad class of ranking functions with certain monotonicity properties, including lexicographic orders and sum-based rankings, the ordering requirement surprisingly does not increase the asymptotic time or space complexity, apart from logarithmic factors. A key insight is the connection between ranked enumeration for database queries and the fundamental task of computing the kth-shortest path in a graph. Although this connection is important for grounding the problem in the literature, it can obfuscate the simplicity of the algorithm. In this article, we adopt a pragmatic approach and present a slightly simplified version of the algorithm without the shortest-path interpretation. We believe that this will benefit practitioners looking to implement and optimize any-k approaches.
Paper Structure (16 sections, 4 theorems, 2 equations, 8 figures, 4 algorithms)

This paper contains 16 sections, 4 theorems, 2 equations, 8 figures, 4 algorithms.

Key Result

Theorem 1

Let $Q$ be an acyclic join query over database $D$ and $L$ a lexicographic order of the variables in $Q$. If $L$ does not contain a disruptive trio, then ranked enumeration of $Q(D)$ by $L$ can be achieved with $\mathrm{TT}(k) = \tilde{{\mathcal{O}}}(n + k)$.

Figures (8)

  • Figure 1: Enumerating the query answers in ranked order without first materializing the unordered query result. Sorting is pushed into the join operation so that joining and ranking are interleaved.
  • Figure 2: SQL query for ranking chains of highly influential citations.
  • Figure 3: Ranked enumeration guarantees for the query of \ref{['fig:sql']}: The first answer (TTF for Time-To-First) is returned in $\tilde{{\mathcal{O}}}(n)$ and the last answer (TTL for Time-To-Last) in $\tilde{{\mathcal{O}}}(n^2)$.
  • Figure 4: An example database for the join query $R(x_1, x_2), S(x_1, x_3), T(x_2, x_4), U(x_4, x_5)$. The relations are organized in a join tree. Red marks indicate tuples removed by the semijoin reduction. Also shown are shared variables between child-parent pairs and the relation ordering $\texttt{rel}$ used by the lexicographic enumeration algorithm.
  • Figure 5: Enumeration steps for the first 4 answers by \ref{['alg:lex']} described in \ref{['sec:lex']}. The stack, shown on top, pops a partial answer, which is extended with the first matching tuples (in orange color) and moved to the output in each iteration. Starting from the last relation for which a tuple is in the partial answer (in red color), we check if a "next" tuple in the same join group exists (in blue color) and push a new answer to the stack. Dashed arrows indicate that there is no next.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Definition 1: Disruptive Trio
  • Theorem 1: LEX
  • Remark 1
  • Theorem 2: SUM
  • Definition 2: Subset-Monotonicity
  • Theorem 3: Dichotomy tziavelis24thesistziavelis23ranked
  • Theorem 4: Non-free-connex bagan07constenumdeep22rankedKimelfeldS2006