Table of Contents
Fetching ...

Sequential Function-Space Variational Inference via Gaussian Mixture Approximation

Menghao Waiyan William Zhu, Pengcheng Hao, Ercan Engin Kuruoğlu

TL;DR

This paper tackles catastrophic forgetting in continual learning by introducing Gaussian Mixture Sequential Function-Space Variational Inference (GM-SFSVI), which models the neural network outputs as a Gaussian mixture over a finite set of inducing points to capture multi-modal posteriors. It extends function-space variational inference by incorporating a mixture of Gaussian processes, with tractable KL terms via diagonalization and a reparameterization scheme (Gumbel-softmax) for training, plus prior-focused and likelihood-focused variants. Empirical results across domain- and class-incremental tasks show that likelihood-focused GM-SFSVI often yields the best final accuracy, especially when learning across all layers without a fixed pre-trained feature extractor, highlighting the method’s robustness to forgetting and its applicability to privacy-preserving replay setups. The work advances continual learning by combining Gaussian mixtures with function-space VI, offering a scalable approach to approximate complex posteriors in sequential tasks with inducing-point representations.

Abstract

Continual learning in neural networks aims to learn new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) uses a Gaussian variational distribution to approximate the distribution of the outputs of the neural network corresponding to a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method based on a Gaussian mixture variational distribution. We also compare different types of variational inference methods with a fixed pre-trained feature extractor (where continual learning is performed on the final layer) and without a fixed pre-trained feature extractor (where continual learning is performed on all layers). We find that in terms of final average accuracy, likelihood-focused Gaussian mixture SFSVI outperforms other sequential variational inference methods, especially in the latter case.

Sequential Function-Space Variational Inference via Gaussian Mixture Approximation

TL;DR

This paper tackles catastrophic forgetting in continual learning by introducing Gaussian Mixture Sequential Function-Space Variational Inference (GM-SFSVI), which models the neural network outputs as a Gaussian mixture over a finite set of inducing points to capture multi-modal posteriors. It extends function-space variational inference by incorporating a mixture of Gaussian processes, with tractable KL terms via diagonalization and a reparameterization scheme (Gumbel-softmax) for training, plus prior-focused and likelihood-focused variants. Empirical results across domain- and class-incremental tasks show that likelihood-focused GM-SFSVI often yields the best final accuracy, especially when learning across all layers without a fixed pre-trained feature extractor, highlighting the method’s robustness to forgetting and its applicability to privacy-preserving replay setups. The work advances continual learning by combining Gaussian mixtures with function-space VI, offering a scalable approach to approximate complex posteriors in sequential tasks with inducing-point representations.

Abstract

Continual learning in neural networks aims to learn new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) uses a Gaussian variational distribution to approximate the distribution of the outputs of the neural network corresponding to a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method based on a Gaussian mixture variational distribution. We also compare different types of variational inference methods with a fixed pre-trained feature extractor (where continual learning is performed on the final layer) and without a fixed pre-trained feature extractor (where continual learning is performed on all layers). We find that in terms of final average accuracy, likelihood-focused Gaussian mixture SFSVI outperforms other sequential variational inference methods, especially in the latter case.

Paper Structure

This paper contains 17 sections, 2 theorems, 15 equations, 2 figures, 2 tables.

Key Result

Lemma 1

Let $\bm\theta$ be an $\mathbb R^n$-valued Gaussian mixture random variable with $k$ components, the $\kappa$-th component having mixing probability $p_\kappa$, mean vector $\mu_\kappa$ and covariance matrix $\Sigma_\kappa$. Let $\bm\kappa$ be the mixing categorical random variable. Consider a condi

Figures (2)

  • Figure 1: Bayesian network for continual learning. $\bm\theta$ is the collection of parameters of the neural network. $(\bm x_t)_{t=1}^T$ are the inputs, and $(\bm y_t)_{t=1}^T$ are the outputs. $\tilde{\bm x}$ and $\tilde{\bm y}$ are the input and output for prediction, respectively. Shaded nodes are observed.
  • Figure 2: Visualizations of prediction probabilities on DI Sinusoid and CI Split 2D Iris. All Gaussian mixture methods use 3 Gaussian components. *A coreset with 16 data points per task is used.

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Proposition 1
  • proof