Table of Contents
Fetching ...

Learning bounded-degree polytrees with known skeleton

Davin Choo, Joy Qiping Yang, Arnab Bhattacharyya, Clément L. Canonne

TL;DR

This work proves finite-sample, computationally efficient learnability for bounded-degree polytrees with a known skeleton, extending tree-based results to $d$-polytrees. It introduces a three-phase orientation algorithm guided by mutual information and conditional mutual information testers, achieving KL-error $\le\varepsilon$ with sample complexity $m = \tilde{\Omega}\left( \frac{n \cdot |\Sigma|^{d+1}}{\varepsilon} \log \frac{1}{\delta} \right)$ and running in polynomial time in $m$, $|\Sigma|^d$, and $n^d$. A matching information-theoretic lower bound shows that dependencies on $n$ and $\varepsilon$ are near-optimal, even when the skeleton is known; a Chow-Liu-based skeleton-recovery condition further supports practical applicability. The results rely on an efficient MI/CMI tester with carefully calibrated thresholds, and Meek rules to propagate orientations while preserving ground-truth structure. This work advances PAC-learning of structured high-dimensional distributions, enabling scalable learning of bounded-degree polytrees under realistic assumptions.

Abstract

We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.

Learning bounded-degree polytrees with known skeleton

TL;DR

This work proves finite-sample, computationally efficient learnability for bounded-degree polytrees with a known skeleton, extending tree-based results to -polytrees. It introduces a three-phase orientation algorithm guided by mutual information and conditional mutual information testers, achieving KL-error with sample complexity and running in polynomial time in , , and . A matching information-theoretic lower bound shows that dependencies on and are near-optimal, even when the skeleton is known; a Chow-Liu-based skeleton-recovery condition further supports practical applicability. The results rely on an efficient MI/CMI tester with carefully calibrated thresholds, and Meek rules to propagate orientations while preserving ground-truth structure. This work advances PAC-learning of structured high-dimensional distributions, enabling scalable learning of bounded-degree polytrees under realistic assumptions.

Abstract

We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns -polytrees in polynomial time and sample complexity for any bounded when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.
Paper Structure (27 sections, 33 theorems, 104 equations, 4 figures, 1 table, 5 algorithms)

This paper contains 27 sections, 33 theorems, 104 equations, 4 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

Consider a discrete distribution $P$ on $n$ variables, each with alphabet $\Sigma$, defined on a polytree $G^*$ with an unknown maximum in-degree $d^*$. Given $m$ samples from $P$, accuracy parameter $\varepsilon>0$, failure probability $\delta$, the skeleton of $G^*$, and an in-degree upper bound $ Moreover, the algorithm runs in time polynomial in $m$, $|\Sigma|^d$, and $n^d$.

Figures (4)

  • Figure 1: 3-polytree example where $I(a; b,c) = I(b; a,c) = I(c; a,b) = 0$ due to deg-3 v-structure centered at $d$. By \ref{['cor:CMI_tester']}, $I(a;f \mid d) = 0$ implies $\hat{I}(a; f \mid d) \leq C \cdot \varepsilon$, and so we will not detect $a \to d \to f$ erroneously as a strong deg-2 v-structure $a \to d \gets f$.
  • Figure 2: An example run to illusrate notations. In $G^*$, vertex $d$ has parents $\pi(d) = \{a,b,c\}$. While the algorithm executes, we track a tentative parent set $N^{in}(d)$ of $d$ and fix it to $\pi^{in}(d)$ right before the final phase. Since $d = 3$, observe that $g \to i$ must have been oriented due to a local search step and not due to Meek $R1(3)$ in Phase 2. At the end, in $\hat{G}$, the proposed parent set of $d$ is $\hat{\pi}(d) = \{a,b,f\}$. Note that $\hat{G}$ only shows one possible orientation of the red unoriented subgraph $H$ before the final phase; see \ref{['fig:all-five-possible-H-orientations']} for others.
  • Figure 3: The five different possible orientations of $H$. Observe that the ground truth orientation of these edges is inconsistent with all five orientations shown here.
  • Figure 4: Illustration of notation used in proof of \ref{['lem:final-output-is-good']}. Suppose (a) is the partial orientation of \ref{['fig:running-example']} after Phase 2, with $H$ as the edge-induced subgraph on the unoriented edges in red. Before the final phase, we have $\pi^{\mathrm{in}}(d) = \{a,b\}$, $\pi^{\mathrm{in}}(g) = \{f,j\}$, $\pi^{\mathrm{in}}(i) = \{g\}$, $\pi^{\mathrm{un}}(c) = \{d\}$, $\pi^{\mathrm{un}}(d) = \{c,f\}$, $\pi^{\mathrm{un}}(f) = \{d,e\}$, $\pi^{\mathrm{un}}(e) = \{h,f\}$, and $\pi^{\mathrm{un}}(h) = \{e\}$. With respect to $H$'s orientation in (b), we have $A = \{c,d,f,e,h\}$, $a_c = d$, $a_d = f$, $a_f = e$, and $a_e = h$. Observe that the $\pi^{\mathrm{un}}$s and $a_{\square}$s are two different ways to refer to the red edges and (b) only shows one possible orientation of $H$ (see \ref{['fig:all-five-possible-H-orientations']} for others).

Theorems & Definitions (38)

  • Theorem 1
  • Definition 2: KL divergence and squared Hellinger distance
  • Definition 3: (Conditional) Mutual Information
  • Corollary 3: Conditional MI tester
  • Definition 4: $\varepsilon$-strong deg-$\ell$ v-structure
  • Lemma 5
  • Lemma 5
  • Lemma 5
  • Lemma 5
  • Lemma 5
  • ...and 28 more