Approximating Queries on Probabilistic Graphs
Antoine Amarilli, Timothy van Bremen, Octave Gaspard, Kuldeep S. Meel
TL;DR
The paper investigates when probabilistic query evaluation on binary-signed probabilistic graphs (tuple-independent databases) can be efficiently approximated in combined complexity. It studies probabilistic graph homomorphism (PHom) across labelled and unlabelled settings, using a taxonomy of graph classes (1WP, 2WP, DWT, PT, DAG, All) to classify tractability. A key technique is representing provenance with nOBDDs and DNNFs to obtain FPRAS results via weighted model counting, yielding a tractable case for 1WP queries on DAGs and conditional hardness in many other settings. The work also derives unconditional DNNF lower bounds, connects approximate PQE to network reliability and RPQs, and discusses implications for prior results (e.g., vBM23) and for RPQ data complexity, offering a roadmap of open questions and future directions. Overall, it clarifies the boundaries between tractable and intractable approximate PQE and illuminates the role of provenance representations in designing efficient algorithms.
Abstract
Query evaluation over probabilistic databases is notoriously intractable -- not only in combined complexity, but often in data complexity as well. This motivates the study of approximation algorithms, and particularly of combined FPRASes, with runtime polynomial in both the query and instance size. In this paper, we focus on tuple-independent probabilistic databases over binary signatures, i.e., probabilistic graphs, and study when we can devise combined FPRASes for probabilistic query evaluation. We settle the complexity of this problem for a variety of query and instance classes, by proving both approximability results and (conditional) inapproximability results together with (unconditional) DNNF provenance circuit size lower bounds. This allows us to deduce many corollaries of possible independent interest. For example, we show how the results of Arenas et al. [ACJR21a] on counting fixed-length strings accepted by an NFA imply the existence of an FPRAS for the two-terminal network reliability problem on directed acyclic graphs, a question asked by Zenklusen and Laumanns [ZL11]. We also show that one cannot extend a recent result of van Bremen and Meel [vBM23] giving a combined FPRAS for self-join-free conjunctive queries of bounded hypertree width on probabilistic databases: neither the bounded-hypertree-width condition nor the self-join-freeness hypothesis can be relaxed. We last show how our methods can give insights on the evaluation and approximability of regular path queries (RPQs) on probabilistic graphs in the data complexity perspective, showing in particular that some of them are (conditionally) inapproximable.
