DELTA: Decoupling Long-Tailed Online Continual Learning
Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu
TL;DR
DELTA tackles long-tailed online continual learning by decoupling representation learning from classifier learning in a two-stage pipeline. Stage 1 uses supervised contrastive loss $L_{contrastive}$ to learn robust representations from the streaming data and memory, while Stage 2 freezes the backbone and trains with Equalization Loss $L_{EQ}$ using a task-specific distribution vector $P(k^t)$ to reweight logits $O^t(I_x)$. A multi-exemplar learning strategy pairs multiple exemplars from memory with each input to balance batches and reduce gradient variance. On CIFAR-100-LT and VFN-LT, DELTA consistently surpasses existing OCL methods across memory sizes and task configurations; ablations confirm the contributions of dual-stage decoupling, $L_{EQ}$, and multi-exemplar pairing. These results suggest strong potential for real-world online learning under severe long-tailed distributions.
Abstract
A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks from sequentially arriving class-imbalanced data streams. Each data is observed only once for training without knowing the task data distribution. We present DELTA, a decoupled learning approach designed to enhance learning representations and address the substantial imbalance in LTOCL. We enhance the learning process by adapting supervised contrastive learning to attract similar samples and repel dissimilar (out-of-class) samples. Further, by balancing gradients during training using an equalization loss, DELTA significantly enhances learning outcomes and successfully mitigates catastrophic forgetting. Through extensive evaluation, we demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods. Our results suggest considerable promise for applying OCL in real-world applications.
