Table of Contents
Fetching ...

Frugal Incremental Generative Modeling using Variational Autoencoders

Victor Enescu, Hichem Sahbi

TL;DR

The paper tackles catastrophic forgetting in continual learning by proposing a replay-free incremental approach based on a conditional Variational Autoencoder. It introduces multi-Gaussian latent priors learned via a fixed-point iteration and enforces orthogonality through null-space gradient projections to prevent forgetting while keeping memory usage frugal. The method leverages CLIP-based prompting to adapt a classifier using synthetic data generated by the decoder, achieving competitive or state-of-the-art results on several incremental-learning benchmarks with dramatically reduced memory costs. A dynamic architecture further mitigates a potential dimensionality bottleneck by selectively expanding only a small portion of parameters per task, enabling scalable continual learning. Overall, the approach combines probabilistic generative modeling, explicit task conditioning, and gradient projection to deliver memory-efficient, high-accuracy continual learning.

Abstract

Continual or incremental learning holds tremendous potential in deep learning with different challenges including catastrophic forgetting. The advent of powerful foundation and generative models has propelled this paradigm even further, making it one of the most viable solution to train these models. However, one of the persisting issues lies in the increasing volume of data particularly with replay-based methods. This growth introduces challenges with scalability since continuously expanding data becomes increasingly demanding as the number of tasks grows. In this paper, we attenuate this issue by devising a novel replay-free incremental learning model based on Variational Autoencoders (VAEs). The main contribution of this work includes (i) a novel incremental generative modelling, built upon a well designed multi-modal latent space, and also (ii) an orthogonality criterion that mitigates catastrophic forgetting of the learned VAEs. The proposed method considers two variants of these VAEs: static and dynamic with no (or at most a controlled) growth in the number of parameters. Extensive experiments show that our method is (at least) an order of magnitude more ``memory-frugal'' compared to the closely related works while achieving SOTA accuracy scores.

Frugal Incremental Generative Modeling using Variational Autoencoders

TL;DR

The paper tackles catastrophic forgetting in continual learning by proposing a replay-free incremental approach based on a conditional Variational Autoencoder. It introduces multi-Gaussian latent priors learned via a fixed-point iteration and enforces orthogonality through null-space gradient projections to prevent forgetting while keeping memory usage frugal. The method leverages CLIP-based prompting to adapt a classifier using synthetic data generated by the decoder, achieving competitive or state-of-the-art results on several incremental-learning benchmarks with dramatically reduced memory costs. A dynamic architecture further mitigates a potential dimensionality bottleneck by selectively expanding only a small portion of parameters per task, enabling scalable continual learning. Overall, the approach combines probabilistic generative modeling, explicit task conditioning, and gradient projection to deliver memory-efficient, high-accuracy continual learning.

Abstract

Continual or incremental learning holds tremendous potential in deep learning with different challenges including catastrophic forgetting. The advent of powerful foundation and generative models has propelled this paradigm even further, making it one of the most viable solution to train these models. However, one of the persisting issues lies in the increasing volume of data particularly with replay-based methods. This growth introduces challenges with scalability since continuously expanding data becomes increasingly demanding as the number of tasks grows. In this paper, we attenuate this issue by devising a novel replay-free incremental learning model based on Variational Autoencoders (VAEs). The main contribution of this work includes (i) a novel incremental generative modelling, built upon a well designed multi-modal latent space, and also (ii) an orthogonality criterion that mitigates catastrophic forgetting of the learned VAEs. The proposed method considers two variants of these VAEs: static and dynamic with no (or at most a controlled) growth in the number of parameters. Extensive experiments show that our method is (at least) an order of magnitude more ``memory-frugal'' compared to the closely related works while achieving SOTA accuracy scores.

Paper Structure

This paper contains 36 sections, 23 equations, 7 figures, 9 tables, 1 algorithm.

Figures (7)

  • Figure 1: Visualization of the incremental learning of gaussians using the fixed-point iteration. The variable 't' denotes the index of a task, and 'y' the label of a class. Fig. \ref{['fig:task1_init']} and \ref{['fig:task2_init']} respectively show the initial gaussians for task 1 and 2, corresponding to the conditional mean of data in the latent space, whereas Fig. \ref{['fig:task1_final']} and \ref{['fig:task2_final']} respectively show the learned gaussians after the fixed-point optimization.
  • Figure 2: This figure shows the proposed architecture for incremental learning based on conditional VAE. The encoder $q_{\phi}({\bf z}| {\bf x}, {\bf y})$ and the decoder $p_{\theta}({\bf x}| {\bf z}, {\bf y})$ are both made of $L$ fully connected layers and are conditioned on priors $\mathcal{N}(\mu_{{\bf y}}, \bf{I})$ in the latent space. The decoder is trained by projecting gradients of new tasks in the null space of previous ones for all fully connected layers excepting the last one, which can optionally be task-dependent for more expressiveness, in which case, the biases are also made task-dependent as they correspond to very few parameters. The encoder on the other hand, is trained without any constraint on all tasks, and using the weights obtained at the end of a previous task. The variable $\hat{{\bf x}}$ generated by the decoder corresponds to the reconstructed ${\bf x}$ that was given as input to the encoder.
  • Figure 3: TSNE projection of classwise noise sampled from gaussians from task 1 to task 6 on CIFAR100 using VAE-FO. It can be noted that the gaussians incrementally learned using fixed-point iteration are clearly separated and non overlapping for all tasks.
  • Figure 4: Accuracy and standard deviation on CUB200 as tasks evolve for VAE-FO.
  • Figure 5: Accuracy and standard deviation on Cars196 as tasks evolve for VAE-FO.
  • ...and 2 more figures