Resilient Class-Incremental Learning: on the Interplay of Drifting, Unlabelled and Imbalanced Data Streams
Jin Li, Kleanthis Malialis, Marios Polycarpou
TL;DR
This paper tackles streaming class-incremental learning under nonstationary data with concept drift, label scarcity, and significant class imbalance. It introduces SCIL, a unified framework that combines an autoencoder (AE) for representation with a multi-layer perceptron (MLP) for prediction, guided by a dual loss $L_{total} = \alpha L_{recon} + (1-\alpha) L_{clf}$ and reinforced by corrected pseudo-labels. A dynamic memory queue and SMOTE oversampling manage class balance and memory efficiency, while a correction mechanism mitigates error propagation when new classes are detected. Empirical results on diverse synthetic and real-world datasets show SCIL consistently outperforms strong baselines and state-of-the-art methods, with ablations validating the design choices and robustness to drift and imbalance; all code and data are publicly available to support Open Science.
Abstract
In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new class emergence, they jointly degrade representation stability, bias learning toward outdated distributions, and reduce the resilience and reliability of detection in dynamic environments. This paper proposes SCIL (Streaming Class-Incremental Learning) to address these challenges. The SCIL framework integrates an autoencoder (AE) with a multi-layer perceptron for multi-class prediction, uses a dual-loss strategy (classification and reconstruction) for prediction and new class detection, employs corrected pseudo-labels for online training, manages classes with queues, and applies oversampling to handle imbalance. The rationale behind the method's structure is elucidated through ablation studies and a comprehensive experimental evaluation is performed using both real-world and synthetic datasets that feature class imbalance, incremental classes, and concept drifts. Our results demonstrate that SCIL outperforms strong baselines and state-of-the-art methods. Based on our commitment to Open Science, we make our code and datasets available to the community.
