CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
Erum Mushtaq, Duygu Nur Yaldiz, Yavuz Faruk Bakman, Jie Ding, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr
TL;DR
This work addresses class-incremental continual self-supervised learning (CSSL) by identifying task confusion as a critical yet underexplored challenge. It introduces CroMo-Mixup, a two-part framework with Cross-Task Data Mixup and Cross-Model Feature Mixup that diversifies negatives and learns cross-task similarities using embeddings from both current and old models, while optionally applying distillation. The approach is shown to be compatible with four SSL objectives and yields consistent improvements in average linear accuracy and Task-ID prediction across CIFAR10, CIFAR100, and TinyImageNet splits, often surpassing CaSSLe and CaSSLe+ under limited memory budgets. These results suggest CroMo-Mixup effectively enhances cross-task class separation and old-knowledge retrieval, with practical implications for scalable CSSL in unlabeled, sequential data settings. Limitations include dependency on a memory buffer and explicit task boundaries; future work could explore privacy-preserving replay and smoother task transitions.
Abstract
Continual self-supervised learning (CSSL) learns a series of tasks sequentially on the unlabeled data. Two main challenges of continual learning are catastrophic forgetting and task confusion. While CSSL problem has been studied to address the catastrophic forgetting challenge, little work has been done to address the task confusion aspect. In this work, we show through extensive experiments that self-supervised learning (SSL) can make CSSL more susceptible to the task confusion problem, particularly in less diverse settings of class incremental learning because different classes belonging to different tasks are not trained concurrently. Motivated by this challenge, we present a novel cross-model feature Mixup (CroMo-Mixup) framework that addresses this issue through two key components: 1) Cross-Task data Mixup, which mixes samples across tasks to enhance negative sample diversity; and 2) Cross-Model feature Mixup, which learns similarities between embeddings obtained from current and old models of the mixed sample and the original images, facilitating cross-task class contrast learning and old knowledge retrieval. We evaluate the effectiveness of CroMo-Mixup to improve both Task-ID prediction and average linear accuracy across all tasks on three datasets, CIFAR10, CIFAR100, and tinyImageNet under different class-incremental learning settings. We validate the compatibility of CroMo-Mixup on four state-of-the-art SSL objectives. Code is available at \url{https://github.com/ErumMushtaq/CroMo-Mixup}.
