Table of Contents
Fetching ...

Hidden Markov Neural Networks

Lorenzo Rimella, Nick Whiteley

TL;DR

Hidden Markov Neural Networks (HMNNs) introduce a time-evolving Bayesian framework that treats neural network weights as hidden states in a factorial hidden Markov model, enabling continual adaptation with principled forgetting. Inference is performed sequentially via variational filtering, using a forward prediction and correction scheme, and a sequential reparameterization trick to estimate gradients. The Gaussian mixture variational family with variational DropConnect provides robust regularization and scalable inference, with closed-form updates in the Gaussian case. Empirically, HMNNs demonstrate strong predictive performance and meaningful uncertainty quantification on time-varying tasks, including MNIST-like drift scenarios and one-step-ahead video frame prediction, while outperforming several continual-learning baselines in dynamic settings.

Abstract

We define an evolving in-time Bayesian neural network called a Hidden Markov Neural Network, which addresses the crucial challenge in time-series forecasting and continual learning: striking a balance between adapting to new data and appropriately forgetting outdated information. This is achieved by modelling the weights of a neural network as the hidden states of a Hidden Markov model, with the observed process defined by the available data. A filtering algorithm is employed to learn a variational approximation of the evolving-in-time posterior distribution over the weights. By leveraging a sequential variant of Bayes by Backprop, enriched with a stronger regularization technique called variational DropConnect, Hidden Markov Neural Networks achieve robust regularization and scalable inference. Experiments on MNIST, dynamic classification tasks, and next-frame forecasting in videos demonstrate that Hidden Markov Neural Networks provide strong predictive performance while enabling effective uncertainty quantification.

Hidden Markov Neural Networks

TL;DR

Hidden Markov Neural Networks (HMNNs) introduce a time-evolving Bayesian framework that treats neural network weights as hidden states in a factorial hidden Markov model, enabling continual adaptation with principled forgetting. Inference is performed sequentially via variational filtering, using a forward prediction and correction scheme, and a sequential reparameterization trick to estimate gradients. The Gaussian mixture variational family with variational DropConnect provides robust regularization and scalable inference, with closed-form updates in the Gaussian case. Empirically, HMNNs demonstrate strong predictive performance and meaningful uncertainty quantification on time-varying tasks, including MNIST-like drift scenarios and one-step-ahead video frame prediction, while outperforming several continual-learning baselines in dynamic settings.

Abstract

We define an evolving in-time Bayesian neural network called a Hidden Markov Neural Network, which addresses the crucial challenge in time-series forecasting and continual learning: striking a balance between adapting to new data and appropriately forgetting outdated information. This is achieved by modelling the weights of a neural network as the hidden states of a Hidden Markov model, with the observed process defined by the available data. A filtering algorithm is employed to learn a variational approximation of the evolving-in-time posterior distribution over the weights. By leveraging a sequential variant of Bayes by Backprop, enriched with a stronger regularization technique called variational DropConnect, Hidden Markov Neural Networks achieve robust regularization and scalable inference. Experiments on MNIST, dynamic classification tasks, and next-frame forecasting in videos demonstrate that Hidden Markov Neural Networks provide strong predictive performance while enabling effective uncertainty quantification.

Paper Structure

This paper contains 24 sections, 1 theorem, 22 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Consider a Gaussian random variable $W \sim \mathcal{N} \left ( \cdot| \mu_1, \sigma_1^2 \right )$ and let: with $\xi \sim \mathcal{N}(\cdot|0,1)$. Then the distribution of $\tilde{W}$ is again Gaussian:

Figures (8)

  • Figure 1: On the left: the conditional independence structure of an HMM. On the right: the conditional independence structure of an FHMM
  • Figure 2: Performance on a validation set of a Bayes by Backprop with and without variational DropConnect. The plot on the bottom is a zoom-in of the plot on the top.
  • Figure 3: First row, well separated "two moons". Second row, overlapping "two moons". Different columns are associated with different time steps. Different colours are associated with different labels.
  • Figure 4: First row, well separated "two moons". Second row, overlapping "two moons". Different columns are associated with different time steps. The plot shows the length of the 95% credible interval. The blue and yellow surface is the probability prediction on the second class. Different coloured dots are associated with different labels.
  • Figure 5: First row, well separated "two moons". Second row, overlapping "two moons". Different columns are associated with different time steps. The blue and yellow surface is the probability prediction on the second class. Pink and grey shaded surfaces represent the 95% credible intervals.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof