Table of Contents
Fetching ...

#CFG and #DNNF admit FPRAS

Kuldeep S. Meel, Alexis de Colnet

TL;DR

This work resolves longstanding open questions by giving the first fully polynomial-time randomized approximation schemes (FPRAS) for counting words of length $n$ in a context-free language and for counting satisfying assignments of a DNNF circuit. The authors reduce both problems to multilinear, homogeneous $(+,\times)$ programs and design an estimator that tracks $p(q) \approx \frac{16n}{|\\mathit{supp}(q)|}$ along with carefully controlled dependent samples, using a median-of-means framework to achieve high-probability accuracy. The key innovations include allowing dependence among samples, leveraging derivation-tree structure via lcsn to bound joint probabilities, and applying depth-reduction to ensure polynomial-time performance. As a corollary, they obtain almost-uniform samplers for CFG and DNNF, enabling efficient random generation of objects in these classes. Overall, the results significantly advance approximate counting and sampling for fundamental models in formal languages and Boolean circuits, with broad implications for probabilistic reasoning and verification tasks.

Abstract

We provide the first fully polynomial-time randomized approximation scheme for the following two counting problems: 1. Given a Context Free Grammar $G$ over alphabet $Σ$, count the number of words of length exactly $n$ generated by $G$. 2. Given a circuit $\varphi$ in Decomposable Negation Normal Form (DNNF) over the set of Boolean variables $X$, compute the number of assignments to $X$ such that $\varphi$ evaluates to 1. Finding polynomial time algorithms for the aforementioned problems has been a longstanding open problem. Prior work could either only obtain a quasi-polynomial runtime (SODA 1995) or a polynomial-time randomized approximation scheme for restricted fragments, such as non-deterministic finite automata (JACM 2021) or non-deterministic tree automata (STOC 2021).

#CFG and #DNNF admit FPRAS

TL;DR

This work resolves longstanding open questions by giving the first fully polynomial-time randomized approximation schemes (FPRAS) for counting words of length in a context-free language and for counting satisfying assignments of a DNNF circuit. The authors reduce both problems to multilinear, homogeneous programs and design an estimator that tracks along with carefully controlled dependent samples, using a median-of-means framework to achieve high-probability accuracy. The key innovations include allowing dependence among samples, leveraging derivation-tree structure via lcsn to bound joint probabilities, and applying depth-reduction to ensure polynomial-time performance. As a corollary, they obtain almost-uniform samplers for CFG and DNNF, enabling efficient random generation of objects in these classes. Overall, the results significantly advance approximate counting and sampling for fundamental models in formal languages and Boolean circuits, with broad implications for probabilistic reasoning and verification tasks.

Abstract

We provide the first fully polynomial-time randomized approximation scheme for the following two counting problems: 1. Given a Context Free Grammar over alphabet , count the number of words of length exactly generated by . 2. Given a circuit in Decomposable Negation Normal Form (DNNF) over the set of Boolean variables , compute the number of assignments to such that evaluates to 1. Finding polynomial time algorithms for the aforementioned problems has been a longstanding open problem. Prior work could either only obtain a quasi-polynomial runtime (SODA 1995) or a polynomial-time randomized approximation scheme for restricted fragments, such as non-deterministic finite automata (JACM 2021) or non-deterministic tree automata (STOC 2021).

Paper Structure

This paper contains 14 sections, 18 theorems, 41 equations, 2 figures, 6 algorithms.

Key Result

Theorem 1

There is an algorithm $\pazocal{A}$ that takes a CFG $G$ and $n$ (in unary) as input and returns an estimate $\mathsf{est}$ such that $\Pr\left[ \mathsf{est} \in (1 \pm \varepsilon)|L_n(G)|\right] \geq 1 - \delta.$ Furthermore, $\pazocal{A}$ runs in time $\mathrm{poly}(\varepsilon^{-1}, \log \delta^

Figures (2)

  • Figure 1: On the left: an homogeneous multilinear $(+,\times)$ program. On the right: the derivation tree for the monomial $x_2x_4x_7x_8$ at the root node when its children are ordered from left to right.
  • Figure 2: On the left: the derivation tree $T$ for $x_1x_4x_7x_8 \in \mathit{supp}(q)$, with $q$ the root node. On the right: the derivation tree $T'$ for $x_1x_3x_7x_8 \in \mathit{supp}(q)$. The circled nodes form $\mathit{lcsn}(T,T')$.

Theorems & Definitions (38)

  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Corollary 2
  • Lemma 3: GoreJKSM97
  • Proposition 4
  • Theorem 5
  • Remark 1
  • Lemma 6
  • proof
  • ...and 28 more