Table of Contents
Fetching ...

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Boyao Li, Alexander J. Thomson, Houssam Nassif, Matthew M. Engelhard, David Page

TL;DR

The paper addresses the lack of precise probabilistic semantics for deep neural networks by constructing an infinite-width tree-structured probabilistic graphical model (PGM) that corresponds exactly to any given DNN architecture. It proves that, for sigmoid activations, forward propagation in a DNN matches exact inference in this PGM, and it further shows the corresponding gradients align with backpropagation (Theorems 1–2). The authors extend the framework to nonnegative activations and outline a practical Hamiltonian Monte Carlo (HMC)–based fine-tuning algorithm (with CD-like updates) that leverages the PGM perspective to improve calibration. Empirical results on synthetic data and the Covertype dataset demonstrate calibration gains from HMC-based fine-tuning, suggesting a viable path to integrating PGMs and DNNs for uncertainty quantification and interpretability in hybrid models.

Abstract

Deep neural networks (DNNs) lack the precise semantics and definitive probabilistic interpretation of probabilistic graphical models (PGMs). In this paper, we propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks. Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are precise in this alternative PGM structure. Not only does our research complement existing studies that describe neural networks as kernel machines or infinite-sized Gaussian processes, it also elucidates a more direct approximation that DNNs make to exact inference in PGMs. Potential benefits include improved pedagogy and interpretation of DNNs, and algorithms that can merge the strengths of PGMs and DNNs.

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

TL;DR

The paper addresses the lack of precise probabilistic semantics for deep neural networks by constructing an infinite-width tree-structured probabilistic graphical model (PGM) that corresponds exactly to any given DNN architecture. It proves that, for sigmoid activations, forward propagation in a DNN matches exact inference in this PGM, and it further shows the corresponding gradients align with backpropagation (Theorems 1–2). The authors extend the framework to nonnegative activations and outline a practical Hamiltonian Monte Carlo (HMC)–based fine-tuning algorithm (with CD-like updates) that leverages the PGM perspective to improve calibration. Empirical results on synthetic data and the Covertype dataset demonstrate calibration gains from HMC-based fine-tuning, suggesting a viable path to integrating PGMs and DNNs for uncertainty quantification and interpretability in hybrid models.

Abstract

Deep neural networks (DNNs) lack the precise semantics and definitive probabilistic interpretation of probabilistic graphical models (PGMs). In this paper, we propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks. Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are precise in this alternative PGM structure. Not only does our research complement existing studies that describe neural networks as kernel machines or infinite-sized Gaussian processes, it also elucidates a more direct approximation that DNNs make to exact inference in PGMs. Potential benefits include improved pedagogy and interpretation of DNNs, and algorithms that can merge the strengths of PGMs and DNNs.
Paper Structure (18 sections, 3 theorems, 40 equations, 1 figure, 3 tables, 3 algorithms)

This paper contains 18 sections, 3 theorems, 40 equations, 1 figure, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

In the PGM construction, as $L \to \infty$, $P(H=1 | \vec{x}) \to \sigma(\sum_{j=1}^M w_j g_j + \sum_i^N \theta_i \sigma(p_i))$, for an arbitrary latent node $H$ in the DNN that has observed parents $g_1, ..., g_M$ and latent parents $h_1, ..., h_N$ that are true with probabilities $\sigma(p_1), ...

Figures (1)

  • Figure 1: The first step of the PGM construction where shared latent parents are separated into copies along with the subtree of their ancestors. Copies of nodes H1 and H2 are made in this example.

Theorems & Definitions (4)

  • Theorem 3.1: Matching Probabilities
  • Theorem 3.2: Matching Gradients
  • Theorem A.1
  • proof