Learning CNF formulas from uniform random solutions in the local lemma regime
Weiming Feng, Xiongxin Yang, Yixiao Yu, Yiyao Zhang
TL;DR
This work analyzes the problem of learning a $k$-CNF from i.i.d. uniform satisfying assignments, linking CNFs to hard-constraint MRFs. By revisiting Valiant's algorithm under Lovász local lemma-type conditions, the authors identify two regimes where sample complexity drops dramatically: (i) CNFs with bounded intersection size permit exact learning with $T=O( ext{log} obreaker(n/ ext{δ}))$ samples, and (ii) random $k$-CNFs near the satisfiability threshold can be learned with $T= ilde{O}ig(n^{ ext{exp}(- ext{√k})}ig)$ samples. For CNFs without intersection bounds, the paper proves information-theoretic lower bounds showing exponential sample requirements for exact learning and polynomial lower bounds for approximate learning, highlighting a sharp contrast with the bounded-intersection results. The analysis introduces a resilience property derived from the local lemma and a specialized revealing process, enabling strong guarantees for learning well-behaved CNFs and, in the random-CNF setting, for typical instances near threshold. Across, the paper also develops lower bounds via information-theoretic arguments, clarifying fundamental limits of learning CNFs from uniform solutions in both exact and approximate settings. The results significantly improve prior PAC-learning bounds for CNFs in these regimes and illuminate the role of clause-intersection structure in sample complexity.
Abstract
We study the problem of learning a $n$-variables $k$-CNF formula $Φ$ from its i.i.d. uniform random solutions, which is equivalent to learning a Boolean Markov random field (MRF) with $k$-wise hard constraints. Revisiting Valiant's algorithm (Commun. ACM'84), we show that it can exactly learn (1) $k$-CNFs with bounded clause intersection size under Lovász local lemma type conditions, from $O(\log n)$ samples; and (2) random $k$-CNFs near the satisfiability threshold, from $\widetilde{O}(n^{\exp(-\sqrt{k})})$ samples. These results significantly improve the previous $O(n^k)$ sample complexity. We further establish new information-theoretic lower bounds on sample complexity for both exact and approximate learning from i.i.d. uniform random solutions.
