Table of Contents
Fetching ...

Learning CNF formulas from uniform random solutions in the local lemma regime

Weiming Feng, Xiongxin Yang, Yixiao Yu, Yiyao Zhang

TL;DR

This work analyzes the problem of learning a $k$-CNF from i.i.d. uniform satisfying assignments, linking CNFs to hard-constraint MRFs. By revisiting Valiant's algorithm under Lovász local lemma-type conditions, the authors identify two regimes where sample complexity drops dramatically: (i) CNFs with bounded intersection size permit exact learning with $T=O( ext{log} obreaker(n/ ext{δ}))$ samples, and (ii) random $k$-CNFs near the satisfiability threshold can be learned with $T= ilde{O}ig(n^{ ext{exp}(- ext{√k})}ig)$ samples. For CNFs without intersection bounds, the paper proves information-theoretic lower bounds showing exponential sample requirements for exact learning and polynomial lower bounds for approximate learning, highlighting a sharp contrast with the bounded-intersection results. The analysis introduces a resilience property derived from the local lemma and a specialized revealing process, enabling strong guarantees for learning well-behaved CNFs and, in the random-CNF setting, for typical instances near threshold. Across, the paper also develops lower bounds via information-theoretic arguments, clarifying fundamental limits of learning CNFs from uniform solutions in both exact and approximate settings. The results significantly improve prior PAC-learning bounds for CNFs in these regimes and illuminate the role of clause-intersection structure in sample complexity.

Abstract

We study the problem of learning a $n$-variables $k$-CNF formula $Φ$ from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with $k$-wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) $k$-CNFs with bounded clause intersection size under Lovász local lemma type conditions, from $O(\log n)$ samples; and (2) random $k$-CNFs near the satisfiability threshold, from $\widetilde{O}(n^{\exp(-\sqrt{k})})$ samples. These results significantly improve the previous $O(n^k)$ sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.

Learning CNF formulas from uniform random solutions in the local lemma regime

TL;DR

This work analyzes the problem of learning a -CNF from i.i.d. uniform satisfying assignments, linking CNFs to hard-constraint MRFs. By revisiting Valiant's algorithm under Lovász local lemma-type conditions, the authors identify two regimes where sample complexity drops dramatically: (i) CNFs with bounded intersection size permit exact learning with samples, and (ii) random -CNFs near the satisfiability threshold can be learned with samples. For CNFs without intersection bounds, the paper proves information-theoretic lower bounds showing exponential sample requirements for exact learning and polynomial lower bounds for approximate learning, highlighting a sharp contrast with the bounded-intersection results. The analysis introduces a resilience property derived from the local lemma and a specialized revealing process, enabling strong guarantees for learning well-behaved CNFs and, in the random-CNF setting, for typical instances near threshold. Across, the paper also develops lower bounds via information-theoretic arguments, clarifying fundamental limits of learning CNFs from uniform solutions in both exact and approximate settings. The results significantly improve prior PAC-learning bounds for CNFs in these regimes and illuminate the role of clause-intersection structure in sample complexity.

Abstract

We study the problem of learning a -variables -CNF formula from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with -wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) -CNFs with bounded clause intersection size under Lovász local lemma type conditions, from samples; and (2) random -CNFs near the satisfiability threshold, from samples. These results significantly improve the previous sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.

Paper Structure

This paper contains 47 sections, 47 theorems, 58 equations, 2 figures, 2 algorithms.

Key Result

Theorem 1.1

Let $k \geq 2$ be a constant integer. For any $\varepsilon > 0$ and $\delta > 0$, Valiant's algorithm approximately (within total variation distance error at most $\varepsilon$) learns any satisfiable $k$-CNF formula from i.i.d. uniform solutions with probability at least $1 - \delta$ in sample comp

Figures (2)

  • Figure 1: An illustration of the depth-$4$ gadgets for $k = 3$, where black-bordered shapes denote variables $v_{i,j}$ and colored shapes denote clauses. Clauses with solid borders forbid all-True assignments and clauses with dashed borders forbid all-False assignments. The leftmost clause with a purple boundary is the restricted clause $c$. For clarity, variables $v_{2,\cdot}$ and $v_{3,\cdot}$ in the second and third layers are intentionally widened to better display the hyperedges.
  • Figure 2: An illustration of hard CNF formulas $\Phi_i$ for $k = 3$, $\ell = 4$, $m=4$ and $i=(1010)_2$.

Theorems & Definitions (61)

  • Theorem 1.1: Valiant84, Theorem A
  • Definition 1.2: ($k,d,s$)-CNF formula
  • Theorem 1.4
  • Corollary 1.5: Linear CNF formulas
  • Theorem 1.6
  • Theorem 1.7
  • Theorem 1.8
  • Theorem 1.9
  • Theorem 1.11
  • Definition 2.1: $\theta$-resilience
  • ...and 51 more