Evaluating Datalog over Semirings: A Grounding-based Approach
Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, Val Tannen
TL;DR
This work tackles the problem of obtaining tight data-time bounds for evaluating Datalog programs over naturally-ordered semirings by introducing a two-phase grounding-based framework: (i) grounding the program into a σ-equivalent polynomial system, and (ii) computing the least fixpoint of this grounding over the semiring. It develops structure-aware grounding methods and semiring-aware fixpoint algorithms, achieving state-of-the-art bounds for practical fragments (e.g., CFL reachability, APSP, linear Datalog) and proving matching lower bounds in certain regimes. The framework unifies and extends previous results, enabling efficient evaluation across a broad class of semirings, including finite-rank and absorptive semirings with total order, and provides a general grounding algorithm based on tree decompositions and the PANDA technique. These results have practical impact on optimizing recursive, semiring-annotated queries in data analytics and program analysis while clarifying the trade-offs between grounding size and semiring operations.
Abstract
Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring $σ$, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over $σ$: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over $σ$. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.
