U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

Song Mei

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

Song Mei

TL;DR

This work provides a theoretical bridge between U-Net architectures and belief propagation denoising within generative hierarchical models (GHMs), showing that the encoder–decoder structure with long skip connections naturally implements BP-like downward/upward message passing on tree-structured graphs. It establishes polynomial sample complexity bounds for learning the denoising function with U-Nets in GHMs and similarly demonstrates ConvNets excel at classification within the same framework, offering a unified view of ConvNets and U-Nets in modeling complex data across language and image domains. The results connect to diffusion modeling by situating denoising within GHMs as a core component of diffusion-like procedures, and provide constructive approximation theorems showing how BP/MP updates can be emulated by neural networks. Overall, the paper advances theoretical understanding of why U-Nets perform well in denoising and diffusion tasks and suggests directions for validating these insights on practical pretrained models and extending to conditional denoising.

Abstract

U-Nets are among the most widely used architectures in computer vision, renowned for their exceptional performance in applications such as image segmentation, denoising, and diffusion modeling. However, a theoretical explanation of the U-Net architecture design has not yet been fully established. This paper introduces a novel interpretation of the U-Net architecture by studying certain generative hierarchical models, which are tree-structured graphical models extensively utilized in both language and image domains. With their encoder-decoder structure, long skip connections, and pooling and up-sampling layers, we demonstrate how U-Nets can naturally implement the belief propagation denoising algorithm in such generative hierarchical models, thereby efficiently approximating the denoising functions. This leads to an efficient sample complexity bound for learning the denoising function using U-Nets within these models. Additionally, we discuss the broader implications of these findings for diffusion models in generative hierarchical models. We also demonstrate that the conventional architecture of convolutional neural networks (ConvNets) is ideally suited for classification tasks within these models. This offers a unified view of the roles of ConvNets and U-Nets, highlighting the versatility of generative hierarchical models in modeling complex data distributions across language and image domains.

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

TL;DR

Abstract

Paper Structure (43 sections, 33 theorems, 179 equations, 2 figures)

This paper contains 43 sections, 33 theorems, 179 equations, 2 figures.

Introduction
The generative hierarchical model
The generative hierarchical model.
GHMs as natural models for languages and images.
The warm-up problem: Classification in GHMs
The ConvNet architecture.
The ERM estimator.
Sample complexity bound.
Proof strategy: ConvNets approximate the belief propagation algorithm
The belief propagation and message passing algorithm.
Approximating message passing with ConvNets.
Denoising and diffusion in GHMs
The U-Net architecture.
The ERM estimator.
Sample complexity bound.
...and 28 more sections

Key Result

Theorem 1

Let Assumption ass:factorization_transition and ass:bounded_transition hold. Let ${\mathcal{W}}_{{d}, {\underline{m}}, L, {S}, D, B}$ be the set defined as in Eq. (eqn:parameter_set_classification), where $D \ge {S}^2 K^2 {d} \cdot 3^L$ and $B = {\rm Poly}({d}, {S}, K, 3^L, D)$. Let $\widehat{{\bol

Figures (2)

Figure 1: Left: The generative hierarchical model with $3$ layers and ${m}^{(1)} = 3$, ${m}^{(2)} = 3$, and ${m}^{(3)} = 2$ children in each layer. Right: A $3$-layer convolutional neural network.
Figure 2: A U-Net with $L=3$. "AP" stands for average-pooling. "US" stands for up-sampling. "SKIP" stands for long skip connections.

Theorems & Definitions (67)

Remark 1: An explanation of the "ConvNet" architecture
Theorem 1: Learning to classify using ConvNets
Remark 2
Lemma 1: BP calculates the Bayes classifier exactly pearl2022reverendwainwright2008graphicalmezard2009information
Proposition 2: BP reduces to MP
Theorem 3: ConvNets approximation of Bayes classifier
Remark 3: An explanation of the "U-Net" architecture
Theorem 4: Learning to denoise using U-Nets
Remark 4
Lemma 2: BP calculates the Bayes denoiser exactly pearl2022reverendwainwright2008graphicalmezard2009information
...and 57 more

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

TL;DR

Abstract

U-Nets as Belief Propagation: Efficient Classification, Denoising, and Diffusion in Generative Hierarchical Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (67)