Circuits and Formulas for Datalog over Semirings
Austen Z. Fan, Paraschos Koutris, Sudeepa Roy
TL;DR
The paper addresses how to efficiently store and compute provenance polynomials for Datalog programs interpreted over absorptive semirings by analyzing circuit and formula representations. It develops a depth versus size framework, proving a dichotomy in circuit depth between $Θ(\log m)$ and $Θ(\log^2 m)$ for several Datalog fragments, and shows polynomial-size circuits align with these depth regimes. It also identifies conditions under which polynomial-size formulas exist, notably linking boundedness over absorptive ⊗-idempotent semirings to equivalence with UCQs and finite CFG languages. Additionally, the work extends upper bounds to the polynomial fringe property, provides tight results for RPQs and TC-like problems, and situates these results within broader semiring theory and query evaluation, with implications for parallel provenance computation.
Abstract
In this paper, we study circuits and formulas for provenance polynomials of Datalog programs. We ask the following question: given an absorptive semiring and a fact of a Datalog program, what is the optimal depth and size of a circuit/formula that computes its provenance polynomial? We focus on absorptive semirings as these guarantee the existence of a polynomial-size circuit. Our main result is a dichotomy for several classes of Datalog programs on whether they admit a formula of polynomial size or not. We achieve this result by showing that for these Datalog programs the optimal circuit depth is either $Θ(\log m)$ or $Θ(\log^2 m)$, where $m$ is the input size. We also show that for Datalog programs with the polynomial fringe property, we can always construct low-depth circuits of size $O(\log^2 m)$. Finally, we give characterizations of when Datalog programs are bounded over more general semirings.
