Learning in Deep Factor Graphs with Gaussian Belief Propagation

Seth Nabarro; Mark van der Wilk; Andrew J Davison

Learning in Deep Factor Graphs with Gaussian Belief Propagation

Seth Nabarro, Mark van der Wilk, Andrew J Davison

TL;DR

This work addresses learning in deep neural-like models under continual and distributed settings by modeling all relevant quantities as random variables within Gaussian factor graphs and performing training and inference with Gaussian Belief Propagation (GBP). By linearising non-linear factors and leveraging local, parallel message updates, GBP enables scalable, asynchronous training and continual learning via Bayesian filtering of parameters. The authors demonstrate learning with learnable parameters across CNN-like architectures, achieving competitive results on video denoising and strong performance in MNIST and CIFAR10 with single-pass or limited replay data, while outperforming prior GBP-based methods. The approach integrates energy-based modelling with probabilistic inference, offering a flexible, hardware-friendly alternative to backpropagation for distributed, continual learning in deep architectures.

Abstract

We propose an approach to do learning in Gaussian factor graphs. We treat all relevant quantities (inputs, outputs, parameters, latents) as random variables in a graphical model, and view both training and prediction as inference problems with different observed nodes. Our experiments show that these problems can be efficiently solved with belief propagation (BP), whose updates are inherently local, presenting exciting opportunities for distributed and asynchronous training. Our approach can be scaled to deep networks and provides a natural means to do continual learning: use the BP-estimated parameter marginals of the current task as parameter priors for the next. On a video denoising task we demonstrate the benefit of learnable parameters over a classical factor graph approach and we show encouraging performance of deep factor graphs for continual image classification.

Learning in Deep Factor Graphs with Gaussian Belief Propagation

TL;DR

Abstract

Paper Structure (36 sections, 19 equations, 8 figures, 8 tables)

This paper contains 36 sections, 19 equations, 8 figures, 8 tables.

Introduction
Background
Factor graphs
Belief Propagation
Gaussian Belief Propagation
Non-linear Factors
*gbp Learning
Deep Factor Graphs
Learning and Predicting with GBP Inference
Efficient GBP
Continual Learning and Minibatching
Related Work
Results
Toy Experiments
Video Denoising
...and 21 more sections

Figures (8)

Figure 1: In *gbp Learning, we design factor graphs whose structure mirrors common NN architectures, enabling distributed training and prediction with GBP. Learnable parameters are included as random variables (circles), as are inputs, outputs and activations. The parameters are shared over across all observations, where the other variables are copied once per observation. Factors (black squares) between layers constrain their representations to be locally consistent, while those attached to inputs and outputs encourage compatibility with observation. The inter-layer factors are non-linear to enable soft-switching behaviour. This example architecture for image classification comprises convolutional, max pooling and dense projection layers. The same architecture could be trained without supervision by removing the output observation factor.
Figure 2: GBP Learning in MLP-like factor graphs (\ref{['subfig:mlp']}) can solve nonlinear regression and classification tasks. (\ref{['subfig:xor']}) was generated with $8$ hidden units, (\ref{['subfig:nonlin_regression']}) with $16$ hidden units.
Figure 3: Video denoising results. Factor graphs with learnable components outperform a hand-specified pairwise smoother. Continual learning of parameters over the video further improves the PSNR over per-frame learning, and the deep model outperforms the single layer. Shading is $\pm1$ standard error (SE) over $10$ seeds.
Figure 4: Single epoch MNIST results. GBP Learning outperforms other methods in the small data regime, and performs similarly to a CNN with a replay buffer of $6\times10^3$ examples on the full training set. Error bars cover $\pm1$SE over $5$ seeds.
Figure \ref{sec:app:video_denoising_experiment}1: A crop from frame 5. The learnt models are able to remove more noise while retaining more high-frequency signal.
...and 3 more figures

Learning in Deep Factor Graphs with Gaussian Belief Propagation

TL;DR

Abstract

Learning in Deep Factor Graphs with Gaussian Belief Propagation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)