Table of Contents
Fetching ...

DNF Learning via Locally Mixing Random Walks

Josh Alman, Shivam Nadimpalli, Shyamal Patel, Rocco A. Servedio

TL;DR

This work advances distribution-free PAC learning of DNFs with membership queries by introducing a novel Locally Mixing Random Walks framework and two enabling results. First, it achieves quasi-polynomial time list-decoding for a single term of an unknown $s$-term DNF under an arbitrary distribution, producing a candidate list guaranteed to contain a true term with high probability. Second, it yields a quasipolynomial-time distribution-free PAC+MQ learner for the subclass of size-$s$ DNFs with all terms of the same size (exact-$k$ DNFs), leveraging a pruning-and-expansion strategy that iteratively builds a compact, correct hypothesis. The key technical engine is a local-mixing analysis on graphs covered by expanders, which enables sampling near-uniformly from promising term-sets and supports the far-point-based learning augmentation. Together, these results push toward efficient learning of rich DNF classes in the distribution-free setting and showcase the power of random-walk and expander techniques in algorithmic learning.

Abstract

We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$. (1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF formula $f = T_1 \vee \cdots \vee T_s$ over $\{0,1\}^n$ and any unknown distribution $D$ over $\{0,1\}^n$, our algorithm, which uses membership queries and random examples from $D$, runs in $\textsf{quasipoly}(n,s)$ time and outputs a list $L$ of candidate terms such that with high probability some term $T_i$ of $f$ belongs to $L$. (2) We then use result (1) to give a $\textsf{quasipoly}(n,s)$-time algorithm, in the distribution-free PAC learning model with membership queries, for learning the class of size-$s$ DNFs in which all terms have the same size. Our algorithm learns using a DNF hypothesis. The key tool used to establish result (1) is a new result on "locally mixing random walks," which, roughly speaking, shows that a random walk on a graph that is covered by a small number of expanders has a non-negligible probability of mixing quickly in a subset of these expanders.

DNF Learning via Locally Mixing Random Walks

TL;DR

This work advances distribution-free PAC learning of DNFs with membership queries by introducing a novel Locally Mixing Random Walks framework and two enabling results. First, it achieves quasi-polynomial time list-decoding for a single term of an unknown -term DNF under an arbitrary distribution, producing a candidate list guaranteed to contain a true term with high probability. Second, it yields a quasipolynomial-time distribution-free PAC+MQ learner for the subclass of size- DNFs with all terms of the same size (exact- DNFs), leveraging a pruning-and-expansion strategy that iteratively builds a compact, correct hypothesis. The key technical engine is a local-mixing analysis on graphs covered by expanders, which enables sampling near-uniformly from promising term-sets and supports the far-point-based learning augmentation. Together, these results push toward efficient learning of rich DNF classes in the distribution-free setting and showcase the power of random-walk and expander techniques in algorithmic learning.

Abstract

We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over . (1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target -term DNF formula over and any unknown distribution over , our algorithm, which uses membership queries and random examples from , runs in time and outputs a list of candidate terms such that with high probability some term of belongs to . (2) We then use result (1) to give a -time algorithm, in the distribution-free PAC learning model with membership queries, for learning the class of size- DNFs in which all terms have the same size. Our algorithm learns using a DNF hypothesis. The key tool used to establish result (1) is a new result on "locally mixing random walks," which, roughly speaking, shows that a random walk on a graph that is covered by a small number of expanders has a non-negligible probability of mixing quickly in a subset of these expanders.

Paper Structure

This paper contains 38 sections, 53 theorems, 163 equations, 2 figures, 9 algorithms.

Key Result

Theorem 1

Let $f=T_1 \vee \cdots \vee T_s$ be any unknown $s$-term DNF over $\{0,1\}^n$ and let $\mathcal{D}$ be any unknown distribution over $\{0,1\}^n$. Let $p := \operatorname{{\bf Pr}}_{\boldsymbol{x} \sim \mathcal{D}}[f(\boldsymbol{x}) = 1].$ There is an algorithm List-Decode-DNF-Term in the PAC + MQ m and outputs a list $\mathcal{L}$ of at most $(ns)^{O(\log(ns))}$ terms, such that with probability

Figures (2)

  • Figure 1: An illustration of the recursion tree of Find-Far-Point (\ref{['alg:gen-far']}). For simplicity, the node labels only indicate the satisfying assignment $y$ and omit the set $\mathcal{W}$.
  • Figure : The DNF-Learn algorithm for learning DNFs, given access to a weak term learner.

Theorems & Definitions (133)

  • Theorem 1: List-decoding a single term.
  • Remark 2: An easy decision tree analogue, and an easy uniform-distribution analogue
  • Theorem 3: Learning exact-DNF
  • Example 4
  • Theorem 5: Local Mixing, informal statement
  • Example 6
  • Remark 7
  • Definition 8: TV distance
  • Lemma 9
  • proof
  • ...and 123 more