Table of Contents
Fetching ...

Lifelong Learning and Selective Forgetting via Contrastive Strategy

Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding

TL;DR

This work tackles lifelong learning with selective forgetting by proposing a contrastive-learning framework that operates directly on the feature extractor. It introduces global class prototypes and multi-space dispersion to force preserved-class features to cluster tightly while scattering deleted-class features, enabling efficient forgetting that minimizes privacy leakage. The method unifies memory preservation and forgetting through a total loss that combines cross-entropy, distillation, prototype consistency, and in-class/out-of-class contrastive terms, with segmentation-specific background alignment. Empirical results on classification and segmentation benchmarks demonstrate state-of-the-art LSFM performance, validating the efficacy of feature-space forgetting and its potential for privacy-conscious continual learning in real-world applications.

Abstract

Lifelong learning aims to train a model with good performance for new tasks while retaining the capacity of previous tasks. However, some practical scenarios require the system to forget undesirable knowledge due to privacy issues, which is called selective forgetting. The joint task of the two is dubbed Learning with Selective Forgetting (LSF). In this paper, we propose a new framework based on contrastive strategy for LSF. Specifically, for the preserved classes (tasks), we make features extracted from different samples within a same class compacted. And for the deleted classes, we make the features from different samples of a same class dispersed and irregular, i.e., the network does not have any regular response to samples from a specific deleted class as if the network has no training at all. Through maintaining or disturbing the feature distribution, the forgetting and memory of different classes can be or independent of each other. Experiments are conducted on four benchmark datasets, and our method acieves new state-of-the-art.

Lifelong Learning and Selective Forgetting via Contrastive Strategy

TL;DR

This work tackles lifelong learning with selective forgetting by proposing a contrastive-learning framework that operates directly on the feature extractor. It introduces global class prototypes and multi-space dispersion to force preserved-class features to cluster tightly while scattering deleted-class features, enabling efficient forgetting that minimizes privacy leakage. The method unifies memory preservation and forgetting through a total loss that combines cross-entropy, distillation, prototype consistency, and in-class/out-of-class contrastive terms, with segmentation-specific background alignment. Empirical results on classification and segmentation benchmarks demonstrate state-of-the-art LSFM performance, validating the efficacy of feature-space forgetting and its potential for privacy-conscious continual learning in real-world applications.

Abstract

Lifelong learning aims to train a model with good performance for new tasks while retaining the capacity of previous tasks. However, some practical scenarios require the system to forget undesirable knowledge due to privacy issues, which is called selective forgetting. The joint task of the two is dubbed Learning with Selective Forgetting (LSF). In this paper, we propose a new framework based on contrastive strategy for LSF. Specifically, for the preserved classes (tasks), we make features extracted from different samples within a same class compacted. And for the deleted classes, we make the features from different samples of a same class dispersed and irregular, i.e., the network does not have any regular response to samples from a specific deleted class as if the network has no training at all. Through maintaining or disturbing the feature distribution, the forgetting and memory of different classes can be or independent of each other. Experiments are conducted on four benchmark datasets, and our method acieves new state-of-the-art.
Paper Structure (18 sections, 11 equations, 4 figures, 5 tables)

This paper contains 18 sections, 11 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Feature distribution and operations on features. A color represents a class, and blue represents the background class. A dot means the feature of a sample (an image or a pixel). The features extracted from samples of the same class by untrained networks are scattered (shown in (a)), while the features from the trained networks are aggregated (shown in (b)). Inspired by this phenomenon, we calculate the global prototypes for each class (the bold dots in (c)), and then diverge features belonging to the same deleted class (green dots in (c)), but compact features of the same preserved class (blue and red dots in (c)). The above mentioned operations achieve selective forgetting but cause samples of deleted classes to be randomly categorized as any class, thus reducing the accuracy of preserved classes (the dots surrounded by black boxes in (d)). We move features of deleted classes close to background class (shown in (e)), so the deleted classes are resembled to the background class (shown in (f)) to avoid affecting others.
  • Figure 2: The accuracy of deleted classes and new learned classes on different epochs with the setting of 15-5.
  • Figure 3: t-SNE plots of the features obtained from the last layer of the backbone before and after forgetting. Red boxes represents before forgetting and yellow is after, and the both are on the same scale. Each point represents the feature of a pixel (after downsampling), and the whole graph is the visualization result of the features in a batch (batchsize=20). Bicycle, bird and boat are belonged to the deleted classes.
  • Figure 4: Sensitivity analysis. The abscissa represents the ratio of $\lambda_{p}$ to $\lambda_{d}$.