Table of Contents
Fetching ...

Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning

Hyung-Jun Moon, Sung-Bae Cho

TL;DR

A fully differentiable, exemplar-free expandable method composed of two complementary memories that learns common features that can be used across all tasks, and the other combines the shared features to learn discriminative characteristics unique to each sample.

Abstract

Continual learning methods used to force neural networks to process sequential tasks in isolation, preventing them from leveraging useful inter-task relationships and causing them to repeatedly relearn similar features or overly differentiate them. To address this problem, we propose a fully differentiable, exemplar-free expandable method composed of two complementary memories: One learns common features that can be used across all tasks, and the other combines the shared features to learn discriminative characteristics unique to each sample. Both memories are differentiable so that the network can autonomously learn latent representations for each sample. For each task, the memory adjustment module adaptively prunes critical slots and minimally expands capacity to accommodate new concepts, and orthogonal regularization enforces geometric separation between preserved and newly learned memory components to prevent interference. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that the proposed method outperforms 14 state-of-the-art methods for class-incremental learning, achieving final accuracies of 55.13\%, 37.24\%, and 30.11\%, respectively. Additional analysis confirms that, through effective integration and utilization of knowledge, the proposed method can increase average performance across sequential tasks, and it produces feature extraction results closest to the upper bound, thus establishing a new milestone in continual learning.

Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning

TL;DR

A fully differentiable, exemplar-free expandable method composed of two complementary memories that learns common features that can be used across all tasks, and the other combines the shared features to learn discriminative characteristics unique to each sample.

Abstract

Continual learning methods used to force neural networks to process sequential tasks in isolation, preventing them from leveraging useful inter-task relationships and causing them to repeatedly relearn similar features or overly differentiate them. To address this problem, we propose a fully differentiable, exemplar-free expandable method composed of two complementary memories: One learns common features that can be used across all tasks, and the other combines the shared features to learn discriminative characteristics unique to each sample. Both memories are differentiable so that the network can autonomously learn latent representations for each sample. For each task, the memory adjustment module adaptively prunes critical slots and minimally expands capacity to accommodate new concepts, and orthogonal regularization enforces geometric separation between preserved and newly learned memory components to prevent interference. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that the proposed method outperforms 14 state-of-the-art methods for class-incremental learning, achieving final accuracies of 55.13\%, 37.24\%, and 30.11\%, respectively. Additional analysis confirms that, through effective integration and utilization of knowledge, the proposed method can increase average performance across sequential tasks, and it produces feature extraction results closest to the upper bound, thus establishing a new milestone in continual learning.

Paper Structure

This paper contains 49 sections, 14 equations, 11 figures, 8 tables, 3 algorithms.

Figures (11)

  • Figure 1: Comparison of the proposed method with regularization- and architecture-based approaches. (a) Regularization enforces new classes not to interfere with previous tasks. (b) Architecture-based methods freeze the parameters allocated to segregating past classes and isolate them from expanded parameters. (c) Our method encourages maximal mutual utilization of past and new knowledge rather than separation.
  • Figure 2: Overview of the proposed method.
  • Figure 3: Schematic diagram of memory expansion, knowledge pruning and orthogonal regularization.
  • Figure 4: Task-wise accuracy for various CL methods on CIFAR-100 and TinyImageNet under 10 tasks. x-axis denotes the task index and y-axis indicates classification accuracy per task, illustrating differing forgetting dynamics across methods.
  • Figure 5: Task-wise accuracy for various CL methods on CIFAR-100 and TinyImageNet under 20 tasks.
  • ...and 6 more figures