Ranked Enumeration for Database Queries
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
TL;DR
Ranked enumeration tackles returning query results in order of importance without materializing the entire output. The paper develops two main instantiations: lexicographic order and SUM-based ranking for acyclic join queries, achieving $\tilde{{\mathcal{O}}}(n + k)$ time-to-$k$ guarantees through semijoin preprocessing, join indexes, and careful top-down enumeration. It characterizes feasible lexicographic orders via disruptive trios and reverse alpha elimination orders, and extends to SUM with a bottom-up DP that computes optimal subtree weights, coupled with a priority-queue enumeration. Beyond these core cases, the work discusses general ranking functions (subset-monotone), CQs with projection, and non-acyclic cases via decompositions, supported by experiments showing practical speedups over traditional join-then-sort approaches.
Abstract
Ranked enumeration is a query-answering paradigm where the query answers are returned incrementally in order of importance (instead of returning all answers at once). Importance is defined by a ranking function that can be specific to the application, but typically involves either a lexicographic order (e.g., "ORDER BY R.A, S.B" in SQL) or a weighted sum of attributes (e.g., "ORDER BY 3*R.A + 2*S.B"). Recent work has introduced any-k algorithms for (multi-way) join queries, which push ranking into joins and avoid materializing intermediate results until necessary. The top-ranked answers are returned asymptotically faster than the common join-then-rank approach of database systems, resulting in orders-of-magnitude speedup in practice. In addition to their practical usefulness, these techniques complement a long line of theoretical research on unranked enumeration, where answers are also returned incrementally, but with no explicit ordering requirement. For a broad class of ranking functions with certain monotonicity properties, including lexicographic orders and sum-based rankings, the ordering requirement surprisingly does not increase the asymptotic time or space complexity, apart from logarithmic factors. A key insight is the connection between ranked enumeration for database queries and the fundamental task of computing the kth-shortest path in a graph. Although this connection is important for grounding the problem in the literature, it can obfuscate the simplicity of the algorithm. In this article, we adopt a pragmatic approach and present a slightly simplified version of the algorithm without the shortest-path interpretation. We believe that this will benefit practitioners looking to implement and optimize any-k approaches.
