Table of Contents
Fetching ...

Continual Learning Using a Kernel-Based Method Over Foundation Models

Saleh Momeni, Sahisnu Mazumder, Bing Liu

TL;DR

The paper tackles class-incremental learning (CIL), where catastrophic forgetting (CF) and inter-task class separation (ICS) hinder performance. It introduces Kernel Linear Discriminant Analysis (KLDA), which freezes foundation-model features and enhances them with a kernel mapping approximated by Random Fourier Features (RFF), enabling incremental updates to class means and a shared covariance $\boldsymbol{\\Sigma}$. Classification is performed in the kernelized space via Linear Discriminant Analysis, with an ensemble variant KLDA-E that averages posteriors across multiple random-feature instantiations; the method avoids replay data and does not update the FM. Experimental results on text and image datasets show KLDA matching or exceeding the joint-training upper bound in many settings, demonstrating strong robustness across diverse foundation models and domains. These findings highlight the practicality of kernelized, fixed-feature approaches for scalable, replay-free CIL in real-world applications.

Abstract

Continual learning (CL) learns a sequence of tasks incrementally. This paper studies the challenging CL setting of class-incremental learning (CIL). CIL has two key challenges: catastrophic forgetting (CF) and inter-task class separation (ICS). Despite numerous proposed methods, these issues remain persistent obstacles. This paper proposes a novel CIL method, called Kernel Linear Discriminant Analysis (KLDA), that can effectively avoid CF and ICS problems. It leverages only the powerful features learned in a foundation model (FM). However, directly using these features proves suboptimal. To address this, KLDA incorporates the Radial Basis Function (RBF) kernel and its Random Fourier Features (RFF) to enhance the feature representations from the FM, leading to improved performance. When a new task arrives, KLDA computes only the mean for each class in the task and updates a shared covariance matrix for all learned classes based on the kernelized features. Classification is performed using Linear Discriminant Analysis. Our empirical evaluation using text and image classification datasets demonstrates that KLDA significantly outperforms baselines. Remarkably, without relying on replay data, KLDA achieves accuracy comparable to joint training of all classes, which is considered the upper bound for CIL performance. The KLDA code is available at https://github.com/salehmomeni/klda.

Continual Learning Using a Kernel-Based Method Over Foundation Models

TL;DR

The paper tackles class-incremental learning (CIL), where catastrophic forgetting (CF) and inter-task class separation (ICS) hinder performance. It introduces Kernel Linear Discriminant Analysis (KLDA), which freezes foundation-model features and enhances them with a kernel mapping approximated by Random Fourier Features (RFF), enabling incremental updates to class means and a shared covariance . Classification is performed in the kernelized space via Linear Discriminant Analysis, with an ensemble variant KLDA-E that averages posteriors across multiple random-feature instantiations; the method avoids replay data and does not update the FM. Experimental results on text and image datasets show KLDA matching or exceeding the joint-training upper bound in many settings, demonstrating strong robustness across diverse foundation models and domains. These findings highlight the practicality of kernelized, fixed-feature approaches for scalable, replay-free CIL in real-world applications.

Abstract

Continual learning (CL) learns a sequence of tasks incrementally. This paper studies the challenging CL setting of class-incremental learning (CIL). CIL has two key challenges: catastrophic forgetting (CF) and inter-task class separation (ICS). Despite numerous proposed methods, these issues remain persistent obstacles. This paper proposes a novel CIL method, called Kernel Linear Discriminant Analysis (KLDA), that can effectively avoid CF and ICS problems. It leverages only the powerful features learned in a foundation model (FM). However, directly using these features proves suboptimal. To address this, KLDA incorporates the Radial Basis Function (RBF) kernel and its Random Fourier Features (RFF) to enhance the feature representations from the FM, leading to improved performance. When a new task arrives, KLDA computes only the mean for each class in the task and updates a shared covariance matrix for all learned classes based on the kernelized features. Classification is performed using Linear Discriminant Analysis. Our empirical evaluation using text and image classification datasets demonstrates that KLDA significantly outperforms baselines. Remarkably, without relying on replay data, KLDA achieves accuracy comparable to joint training of all classes, which is considered the upper bound for CIL performance. The KLDA code is available at https://github.com/salehmomeni/klda.

Paper Structure

This paper contains 25 sections, 14 equations, 1 figure, 3 tables, 1 algorithm.

Figures (1)

  • Figure 1: Hyperparameter impact on KLDA: (Left) Effect of $\sigma$ with $D = 5000$. (Right) Effect of $D$ with $\sigma = 10^{-3}$. The FM is BART-base with 768 hidden dimensions.