Table of Contents
Fetching ...

Evaluating Datalog over Semirings: A Grounding-based Approach

Hangdong Zhao, Shaleen Deep, Paraschos Koutris, Sudeepa Roy, Val Tannen

TL;DR

This work tackles the problem of obtaining tight data-time bounds for evaluating Datalog programs over naturally-ordered semirings by introducing a two-phase grounding-based framework: (i) grounding the program into a σ-equivalent polynomial system, and (ii) computing the least fixpoint of this grounding over the semiring. It develops structure-aware grounding methods and semiring-aware fixpoint algorithms, achieving state-of-the-art bounds for practical fragments (e.g., CFL reachability, APSP, linear Datalog) and proving matching lower bounds in certain regimes. The framework unifies and extends previous results, enabling efficient evaluation across a broad class of semirings, including finite-rank and absorptive semirings with total order, and provides a general grounding algorithm based on tree decompositions and the PANDA technique. These results have practical impact on optimizing recursive, semiring-annotated queries in data analytics and program analysis while clarifying the trade-offs between grounding size and semiring operations.

Abstract

Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring $σ$, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over $σ$: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over $σ$. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.

Evaluating Datalog over Semirings: A Grounding-based Approach

TL;DR

This work tackles the problem of obtaining tight data-time bounds for evaluating Datalog programs over naturally-ordered semirings by introducing a two-phase grounding-based framework: (i) grounding the program into a σ-equivalent polynomial system, and (ii) computing the least fixpoint of this grounding over the semiring. It develops structure-aware grounding methods and semiring-aware fixpoint algorithms, achieving state-of-the-art bounds for practical fragments (e.g., CFL reachability, APSP, linear Datalog) and proving matching lower bounds in certain regimes. The framework unifies and extends previous results, enabling efficient evaluation across a broad class of semirings, including finite-rank and absorptive semirings with total order, and provides a general grounding algorithm based on tree decompositions and the PANDA technique. These results have practical impact on optimizing recursive, semiring-annotated queries in data analytics and program analysis while clarifying the trade-offs between grounding size and semiring operations.

Abstract

Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring , what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over : first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over . We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.
Paper Structure (35 sections, 23 theorems, 37 equations, 2 figures, 2 tables, 4 algorithms)

This paper contains 35 sections, 23 theorems, 37 equations, 2 figures, 2 tables, 4 algorithms.

Key Result

theorem 1

Let $P$ be a rulewise-acyclic Datalog program over some semiring $\sigma$, with input size $m$, active domain size $n$, and $\mathsf{arity}(P) \leq k$. Then, we can construct a $\sigma$-equivalent grounding in time (and has size) $O(n^{k-1} \cdot (m + n^{k}))$.

Figures (2)

  • Figure 1: A join tree with its corresponding rewriting.
  • Figure 2: A join tree with its corresponding rewriting for \ref{['ex:nonbinary']} when using Algorithm \ref{['grounded-acyclicprogram']}.

Theorems & Definitions (24)

  • theorem 1
  • theorem 2
  • theorem 3
  • theorem 4
  • theorem 5
  • corollary 1
  • lemma 1
  • proposition 1
  • proposition 2
  • proposition 3
  • ...and 14 more