Table of Contents
Fetching ...

Resilient Class-Incremental Learning: on the Interplay of Drifting, Unlabelled and Imbalanced Data Streams

Jin Li, Kleanthis Malialis, Marios Polycarpou

TL;DR

This paper tackles streaming class-incremental learning under nonstationary data with concept drift, label scarcity, and significant class imbalance. It introduces SCIL, a unified framework that combines an autoencoder (AE) for representation with a multi-layer perceptron (MLP) for prediction, guided by a dual loss $L_{total} = \alpha L_{recon} + (1-\alpha) L_{clf}$ and reinforced by corrected pseudo-labels. A dynamic memory queue and SMOTE oversampling manage class balance and memory efficiency, while a correction mechanism mitigates error propagation when new classes are detected. Empirical results on diverse synthetic and real-world datasets show SCIL consistently outperforms strong baselines and state-of-the-art methods, with ablations validating the design choices and robustness to drift and imbalance; all code and data are publicly available to support Open Science.

Abstract

In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new class emergence, they jointly degrade representation stability, bias learning toward outdated distributions, and reduce the resilience and reliability of detection in dynamic environments. This paper proposes SCIL (Streaming Class-Incremental Learning) to address these challenges. The SCIL framework integrates an autoencoder (AE) with a multi-layer perceptron for multi-class prediction, uses a dual-loss strategy (classification and reconstruction) for prediction and new class detection, employs corrected pseudo-labels for online training, manages classes with queues, and applies oversampling to handle imbalance. The rationale behind the method's structure is elucidated through ablation studies and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance, incremental classes, and concept drifts. Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods. Based on our commitment to Open Science, we make our code and datasets available to the community.

Resilient Class-Incremental Learning: on the Interplay of Drifting, Unlabelled and Imbalanced Data Streams

TL;DR

This paper tackles streaming class-incremental learning under nonstationary data with concept drift, label scarcity, and significant class imbalance. It introduces SCIL, a unified framework that combines an autoencoder (AE) for representation with a multi-layer perceptron (MLP) for prediction, guided by a dual loss and reinforced by corrected pseudo-labels. A dynamic memory queue and SMOTE oversampling manage class balance and memory efficiency, while a correction mechanism mitigates error propagation when new classes are detected. Empirical results on diverse synthetic and real-world datasets show SCIL consistently outperforms strong baselines and state-of-the-art methods, with ablations validating the design choices and robustness to drift and imbalance; all code and data are publicly available to support Open Science.

Abstract

In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new class emergence, they jointly degrade representation stability, bias learning toward outdated distributions, and reduce the resilience and reliability of detection in dynamic environments. This paper proposes SCIL (Streaming Class-Incremental Learning) to address these challenges. The SCIL framework integrates an autoencoder (AE) with a multi-layer perceptron for multi-class prediction, uses a dual-loss strategy (classification and reconstruction) for prediction and new class detection, employs corrected pseudo-labels for online training, manages classes with queues, and applies oversampling to handle imbalance. The rationale behind the method's structure is elucidated through ablation studies and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance, incremental classes, and concept drifts. Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods. Based on our commitment to Open Science, we make our code and datasets available to the community.
Paper Structure (30 sections, 18 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 30 sections, 18 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: An overview of SCIL in block diagram representation.
  • Figure 2: Flow chart of correction mechanism.
  • Figure 3: Screenshots of datasets Blob and Sea (Timestamps mark each class's first appearance).
  • Figure 4: Comparison of different input configurations in nonstationary environments.
  • Figure 5: Comparison of different loss settings in nonstationary environments.
  • ...and 5 more figures