Table of Contents
Fetching ...

Engineering an Efficient Approximate DNF-Counter

Mate Soos, Uddalok Sarkar, Divesh Aggarwal, Sourav Chakraborty, Kuldeep S. Meel, Maciej Obremski

TL;DR

This work addresses the challenging problem of approximately counting solutions to DNF formulas (#DNF), a #P-complete task. The authors introduce pepin, a practically efficient FPRAS that replaces the theoretical Binomial-based sampling of prior streaming approaches with a Poisson-based, lazy sampling scheme, augmented by engineering optimizations. They prove correctness with Chernoff-type guarantees and derive a tight time bound, while demonstrating in extensive experiments that pepin achieves up to 40x faster runtimes than previous state-of-the-art methods and substantially lower observed error than expected. The results have strong practical implications for probabilistic databases and network reliability analyses, enabling scalable, reliable volume estimation in large DNFs.

Abstract

Model counting is a fundamental problem in many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in the Disjunctive Normal Form (DNF), also known as #DNF. This problem has been shown to be #P-complete, making it often intractable to solve exactly. Much research has therefore focused on obtaining approximate solutions, particularly in the form of $(\varepsilon, δ)$ approximations. The primary contribution of this paper is a new approach, called pepin, an approximate #DNF counter that significantly outperforms prior state-of-the-art approaches. Our work is based on the recent breakthrough in the context of the union of sets in the streaming model. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing #DNF.

Engineering an Efficient Approximate DNF-Counter

TL;DR

This work addresses the challenging problem of approximately counting solutions to DNF formulas (#DNF), a #P-complete task. The authors introduce pepin, a practically efficient FPRAS that replaces the theoretical Binomial-based sampling of prior streaming approaches with a Poisson-based, lazy sampling scheme, augmented by engineering optimizations. They prove correctness with Chernoff-type guarantees and derive a tight time bound, while demonstrating in extensive experiments that pepin achieves up to 40x faster runtimes than previous state-of-the-art methods and substantially lower observed error than expected. The results have strong practical implications for probabilistic databases and network reliability analyses, enabling scalable, reliable volume estimation in large DNFs.

Abstract

Model counting is a fundamental problem in many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in the Disjunctive Normal Form (DNF), also known as #DNF. This problem has been shown to be #P-complete, making it often intractable to solve exactly. Much research has therefore focused on obtaining approximate solutions, particularly in the form of approximations. The primary contribution of this paper is a new approach, called pepin, an approximate #DNF counter that significantly outperforms prior state-of-the-art approaches. Our work is based on the recent breakthrough in the context of the union of sets in the streaming model. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing #DNF.
Paper Structure (17 sections, 5 theorems, 19 equations, 3 figures, 2 tables, 6 algorithms)

This paper contains 17 sections, 5 theorems, 19 equations, 3 figures, 2 tables, 6 algorithms.

Key Result

Lemma 1

Let $X \gets \mathsf{Poisson}(\lambda)$, for some $\lambda>0$. Then for any $x > 0$ following two inequalities hold.

Figures (3)

  • Figure 1: Performance comparison of $\mathsf{pepin}$ against the other counters, with different cube widths. As can be seen on the included plots, the cube width matters greatly for most counters other than $\mathsf{pepin}$. This is due to the sparse sampling strategy of $\mathsf{pepin}$.
  • Figure 2: The count returned by $\mathsf{pepin}$ compared to the exact counter GANAK. All counts of $\mathsf{pepin}$ were well within the 80% permissible error rate as dictated by $\varepsilon=0.8$
  • Figure 3: Performance comparison of $\mathsf{pepin}$ against its earlier version $\mathsf{pepinBinomial}$.

Theorems & Definitions (11)

  • Claim 1
  • Claim 2
  • proof
  • Lemma 1: Chernoff Bound
  • Theorem 1
  • proof
  • Lemma 2
  • Lemma 2: Chernoff Bound
  • proof
  • Lemma 3: canonne
  • ...and 1 more