On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Boyao Li; Alexander J. Thomson; Houssam Nassif; Matthew M. Engelhard; David Page

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Boyao Li, Alexander J. Thomson, Houssam Nassif, Matthew M. Engelhard, David Page

TL;DR

The paper addresses the lack of precise probabilistic semantics for deep neural networks by constructing an infinite-width tree-structured probabilistic graphical model (PGM) that corresponds exactly to any given DNN architecture. It proves that, for sigmoid activations, forward propagation in a DNN matches exact inference in this PGM, and it further shows the corresponding gradients align with backpropagation (Theorems 1–2). The authors extend the framework to nonnegative activations and outline a practical Hamiltonian Monte Carlo (HMC)–based fine-tuning algorithm (with CD-like updates) that leverages the PGM perspective to improve calibration. Empirical results on synthetic data and the Covertype dataset demonstrate calibration gains from HMC-based fine-tuning, suggesting a viable path to integrating PGMs and DNNs for uncertainty quantification and interpretability in hybrid models.

Abstract

Deep neural networks (DNNs) lack the precise semantics and definitive probabilistic interpretation of probabilistic graphical models (PGMs). In this paper, we propose an innovative solution by constructing infinite tree-structured PGMs that correspond exactly to neural networks. Our research reveals that DNNs, during forward propagation, indeed perform approximations of PGM inference that are precise in this alternative PGM structure. Not only does our research complement existing studies that describe neural networks as kernel machines or infinite-sized Gaussian processes, it also elucidates a more direct approximation that DNNs make to exact inference in PGMs. Potential benefits include improved pedagogy and interpretation of DNNs, and algorithms that can merge the strengths of PGMs and DNNs.

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

TL;DR

Abstract

Paper Structure (18 sections, 3 theorems, 40 equations, 1 figure, 3 tables, 3 algorithms)

This paper contains 18 sections, 3 theorems, 40 equations, 1 figure, 3 tables, 3 algorithms.

Introduction
Background: Comparison to Bayesian Networks and Markov Networks
The Construction of Tree-structured PGMs
Implications and Extensions
Application of the Theory: A New Hamiltonian Monte Carlo Algorithm
Learning via Contrastive Divergence with Hamiltonian Monte Carlo Sampling
Experimental Results
Synthetic experiments
Covertype Experiments
Conclusion, Limitations, and Future Work
Bayesian Belief Net and Markov Net Equivalence
Step 1 and Step 2 Construction Algorithms
A Proof Using Variable Elimination
A Proof of the Gradient
HMC Sampling Trajectories
...and 3 more sections

Key Result

Theorem 3.1

In the PGM construction, as $L \to \infty$, $P(H=1 | \vec{x}) \to \sigma(\sum_{j=1}^M w_j g_j + \sum_i^N \theta_i \sigma(p_i))$, for an arbitrary latent node $H$ in the DNN that has observed parents $g_1, ..., g_M$ and latent parents $h_1, ..., h_N$ that are true with probabilities $\sigma(p_1), ...

Figures (1)

Figure 1: The first step of the PGM construction where shared latent parents are separated into copies along with the subtree of their ancestors. Copies of nodes H1 and H2 are made in this example.

Theorems & Definitions (4)

Theorem 3.1: Matching Probabilities
Theorem 3.2: Matching Gradients
Theorem A.1
proof

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

TL;DR

Abstract

On Neural Networks as Infinite Tree-Structured Probabilistic Graphical Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (4)