Distribution Learning Meets Graph Structure Sampling

Arnab Bhattacharyya; Sutanu Gayen; Philips George John; Sayantan Sen; N. V. Vinodchandran

Distribution Learning Meets Graph Structure Sampling

Arnab Bhattacharyya, Sutanu Gayen, Philips George John, Sayantan Sen, N. V. Vinodchandran

TL;DR

The paper presents a novel online-learning reduction that connects PAC-learning of high-dimensional Bayes nets to efficient counting/sampling of graph structures. By applying Exponential Weighted Averaging (EWA) or Randomized Weighted Majority (RWM) forecasters on sequences of samples with log-loss, it derives regret-based bounds that translate into strong KL-divergence guarantees for learned distributions. The authors achieve near-optimal sample complexities for agnostic learning of indegree-bounded Bayes nets and provide polynomial-time algorithms for learning Bayes nets on known chordal skeletons and for tree-structured distributions, including a new tree-learning approach via the matrix-tree theorem. They also extend results to polytree and bounded-vertex-cover scenarios and discuss maximum likelihood adaptations, demonstrating the framework’s broad applicability to structured distribution learning and structure learning with efficient sampling. The work offers principled guarantees and scalable algorithms with potential impact on structure learning, causal inference, and high-dimensional graphical modeling.

Abstract

This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an online learning framework. We observe that if we apply the exponentially weighted average (EWA) or randomized weighted majority (RWM) forecasters on a sequence of samples from a distribution P using the log loss function, the average regret incurred by the forecaster's predictions can be used to bound the expected KL divergence between P and the predictions. Known regret bounds for EWA and RWM then yield new sample complexity bounds for learning Bayes nets. Moreover, these algorithms can be made computationally efficient for several interesting classes of Bayes nets. Specifically, we give a new sample-optimal and polynomial time learning algorithm with respect to trees of unknown structure and the first polynomial sample and time algorithm for learning with respect to Bayes nets over a given chordal skeleton.

Distribution Learning Meets Graph Structure Sampling

TL;DR

Abstract

Paper Structure (30 sections, 33 theorems, 66 equations, 2 figures, 2 tables, 10 algorithms)

This paper contains 30 sections, 33 theorems, 66 equations, 2 figures, 2 tables, 10 algorithms.

Introduction
Our Results
Learning with Known Chordal Skeleton
Learning Tree-structured Distributions
Why KL divergence?
Our Techniques
Online Learning Framework to Learning in reverse KL
Related Works
Online Learning of Structured Distributions
Robust Learning
Open Problems
Preliminaries
Probability Distributions
PAC Distribution Learning
Online Learning
...and 15 more sections

Key Result

Theorem 2.1

Let $G$ be an undirected chordal graph on $n$ nodes, and suppose $d$ is a fixed constant. Consider the problem of agnostically learning a distribution w.r.t the class of Bayes nets having skeleton $G$ with indegree $\leq d$. There exist (i) an agnostic improper PAC-learner for this problem using $\w

Figures (2)

Figure 1: Given a rooted polytree skeleton, for each node $v$, and for each fixed orientation of edges incident to $v$, we maintain the total weight of all consistent orientations of the subtree rooted at $v$. Above, the orientations of edges incident to $B$ and $C$ are fixed. This is needed when computing the weight for the subtree rooted at $A$, since in the first two panels, the in-degree of $C$ change from 1 to 2, while in the second two panels, $C$'s in-degree does not change.
Figure 2: In the left panel, a chordal graph and a clique tree decomposition with reference clique $C = \textrm{DEG}$. In the left panel, the edges of $\mathsf{Link}(C)$ are in red and the vertices $V[\mathcal{T}_C]$ are in bold. In the right panel, the nodes of $\mathcal{T}_C$ are in green, and the separator vertices in each node of the clique tree are colored in blue. The same chordal graph and a different clique tree decomposition with reference clique $C = \textrm{ACDE}$. The colors have the same meaning as in the left.

Theorems & Definitions (92)

Theorem 2.1
Corollary 2.2
Remark 2.3
Theorem 2.4
Lemma 6.1: Pinkser's inequality
Definition 6.2: Efficiently samplable distribution
Definition 6.3: Bayesian networks
Definition 6.4: Tree-structured distribution
Definition 6.5: Polytree-structured distribution
Definition 6.6: Chordal-structured distribution
...and 82 more

Distribution Learning Meets Graph Structure Sampling

TL;DR

Abstract

Distribution Learning Meets Graph Structure Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (92)