Table of Contents
Fetching ...

KAC: Kolmogorov-Arnold Classifier for Continual Learning

Yusong Hu, Zichen Liang, Fei Yang, Qibin Hou, Xialei Liu, Ming-Ming Cheng

TL;DR

This paper tackles catastrophic forgetting in Class Incremental Learning by replacing traditional linear classifiers with a Kolmogorov-Arnold Classifier (KAC) built on Kolmogorov-Arnold Networks (KAN). By substituting B-spline bases with Gaussian Radial Basis Functions (RBF), KAC creates a Gaussian-structured activation space that preserves stability while maintaining plasticity, enabling effective continual learning. The authors demonstrate that KAC is a plug-in improvement across multiple CIL baselines and datasets, including ImageNet-R, CUB200, DomainNet, and CIFAR-100, with notable gains in long-sequence tasks and without exemplar memory. The mechanism relies on locality in the KAN activations, which concentrates updates on relevant channels for new tasks, reducing forgetting of old tasks and enhancing robustness.

Abstract

Continual learning requires models to train continuously across consecutive tasks without forgetting. Most existing methods utilize linear classifiers, which struggle to maintain a stable classification space while learning new tasks. Inspired by the success of Kolmogorov-Arnold Networks (KAN) in preserving learning stability during simple continual regression tasks, we set out to explore their potential in more complex continual learning scenarios. In this paper, we introduce the Kolmogorov-Arnold Classifier (KAC), a novel classifier developed for continual learning based on the KAN structure. We delve into the impact of KAN's spline functions and introduce Radial Basis Functions (RBF) for improved compatibility with continual learning. We replace linear classifiers with KAC in several recent approaches and conduct experiments across various continual learning benchmarks, all of which demonstrate performance improvements, highlighting the effectiveness and robustness of KAC in continual learning. The code is available at https://github.com/Ethanhuhuhu/KAC.

KAC: Kolmogorov-Arnold Classifier for Continual Learning

TL;DR

This paper tackles catastrophic forgetting in Class Incremental Learning by replacing traditional linear classifiers with a Kolmogorov-Arnold Classifier (KAC) built on Kolmogorov-Arnold Networks (KAN). By substituting B-spline bases with Gaussian Radial Basis Functions (RBF), KAC creates a Gaussian-structured activation space that preserves stability while maintaining plasticity, enabling effective continual learning. The authors demonstrate that KAC is a plug-in improvement across multiple CIL baselines and datasets, including ImageNet-R, CUB200, DomainNet, and CIFAR-100, with notable gains in long-sequence tasks and without exemplar memory. The mechanism relies on locality in the KAN activations, which concentrates updates on relevant channels for new tasks, reducing forgetting of old tasks and enhancing robustness.

Abstract

Continual learning requires models to train continuously across consecutive tasks without forgetting. Most existing methods utilize linear classifiers, which struggle to maintain a stable classification space while learning new tasks. Inspired by the success of Kolmogorov-Arnold Networks (KAN) in preserving learning stability during simple continual regression tasks, we set out to explore their potential in more complex continual learning scenarios. In this paper, we introduce the Kolmogorov-Arnold Classifier (KAC), a novel classifier developed for continual learning based on the KAN structure. We delve into the impact of KAN's spline functions and introduce Radial Basis Functions (RBF) for improved compatibility with continual learning. We replace linear classifiers with KAC in several recent approaches and conduct experiments across various continual learning benchmarks, all of which demonstrate performance improvements, highlighting the effectiveness and robustness of KAC in continual learning. The code is available at https://github.com/Ethanhuhuhu/KAC.

Paper Structure

This paper contains 18 sections, 11 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Brief comparison between conventional linear classifier and our Kolmogorov-Arnold classifier. The solid lines represent activated weights, while the dashed ones represent suppressed weights. (a) Conventional linear classifiers activate each weight equally across all tasks, resulting in irrelevant weights being equally updated in the new task.. (b) our Kolmogorov-Arnold Classifier learns class-specific learnable activations for each channel across all categories, minimizing forgetting caused by irrelevant weight changes.
  • Figure 2: Comparison of the accuracy curves of three recent approaches with different classifiers in the ImageNet-R 20-step scenario. The x-axis represents the increasing number of tasks, while the y-axis shows the corresponding test accuracy at each step. The Baseline indicates performance with a conventional linear classifier, while the other curves represent results with ablated KAN classifiers and our Kolmogorov-Arnold Classifier.
  • Figure 3: An overview of the pipeline of the proposed Kolmogorov-Arnold Classifier. For the input feature embeddings, we first normalize them using a layer normalization, then pass them through a set of RBFs that activate them to learnable Gaussian distributions. Finally, we weight all channels with $W_C$ to obtain the decision space for each class. The right side shows the process of Gaussian RBFs, which map univariate variables to different Gaussian distributions centered at various points and weight these distributions with $W_q^c$ to derive the final activation distribution for each channel across all classes. The output logits are sampled based on the channel values within the distribution of each class. As tasks increase, new classes can be accommodated by simply expanding $W_C$.
  • Figure 4: Activation maps for different classes across different channels. The x-axis represents 50 randomly selected channels from feature embeddings, while the y-axis represents classes from different tasks. The colors indicate varying levels of interest.
  • Figure 5: Ablation study on different numbers of basis functions in the 20 steps ImageNet-R scenario. The x-axis represents the number of basis functions, while the y-axis indicates the average incremental accuracy with varying numbers.
  • ...and 1 more figures