Sequential Function-Space Variational Inference via Gaussian Mixture Approximation
Menghao Waiyan William Zhu, Pengcheng Hao, Ercan Engin Kuruoğlu
TL;DR
This paper tackles catastrophic forgetting in continual learning by introducing Gaussian Mixture Sequential Function-Space Variational Inference (GM-SFSVI), which models the neural network outputs as a Gaussian mixture over a finite set of inducing points to capture multi-modal posteriors. It extends function-space variational inference by incorporating a mixture of Gaussian processes, with tractable KL terms via diagonalization and a reparameterization scheme (Gumbel-softmax) for training, plus prior-focused and likelihood-focused variants. Empirical results across domain- and class-incremental tasks show that likelihood-focused GM-SFSVI often yields the best final accuracy, especially when learning across all layers without a fixed pre-trained feature extractor, highlighting the method’s robustness to forgetting and its applicability to privacy-preserving replay setups. The work advances continual learning by combining Gaussian mixtures with function-space VI, offering a scalable approach to approximate complex posteriors in sequential tasks with inducing-point representations.
Abstract
Continual learning in neural networks aims to learn new tasks without forgetting old tasks. Sequential function-space variational inference (SFSVI) uses a Gaussian variational distribution to approximate the distribution of the outputs of the neural network corresponding to a finite number of selected inducing points. Since the posterior distribution of a neural network is multi-modal, a Gaussian distribution could only match one mode of the posterior distribution, and a Gaussian mixture distribution could be used to better approximate the posterior distribution. We propose an SFSVI method based on a Gaussian mixture variational distribution. We also compare different types of variational inference methods with a fixed pre-trained feature extractor (where continual learning is performed on the final layer) and without a fixed pre-trained feature extractor (where continual learning is performed on all layers). We find that in terms of final average accuracy, likelihood-focused Gaussian mixture SFSVI outperforms other sequential variational inference methods, especially in the latter case.
