Forward-Backward Knowledge Distillation for Continual Clustering
Mohammadreza Sadeghi, Zihan Wang, Narges Armanfard
TL;DR
FBCC tackles catastrophic forgetting in Unsupervised Continual Clustering by introducing a single teacher encoder with a cluster projector and multiple lightweight student encoders. It uses two phases—Forward Knowledge Distillation, where the teacher learns new clusters while leveraging knowledge from specialized students, and Backward Knowledge Distillation, where a lightweight student mimics the teacher to retain task-specific knowledge for future tasks. The method integrates prototype-informed discrimination, contrastive clustering, and a memory-efficient, task-specific clustering head, with memory saved by storing only a set of light-weight students instead of full past models or data. Empirical results on CIFAR-10, CIFAR-100, and Tiny-ImageNet show FBCC achieving higher average clustering accuracy and lower forgetting than strong baselines while using fewer parameters, highlighting its practical potential for real-world, evolving data streams.
Abstract
Unsupervised Continual Learning (UCL) is a burgeoning field in machine learning, focusing on enabling neural networks to sequentially learn tasks without explicit label information. Catastrophic Forgetting (CF), where models forget previously learned tasks upon learning new ones, poses a significant challenge in continual learning, especially in UCL, where labeled information of data is not accessible. CF mitigation strategies, such as knowledge distillation and replay buffers, often face memory inefficiency and privacy issues. Although current research in UCL has endeavored to refine data representations and address CF in streaming data contexts, there is a noticeable lack of algorithms specifically designed for unsupervised clustering. To fill this gap, in this paper, we introduce the concept of Unsupervised Continual Clustering (UCC). We propose Forward-Backward Knowledge Distillation for unsupervised Continual Clustering (FBCC) to counteract CF within the context of UCC. FBCC employs a single continual learner (the ``teacher'') with a cluster projector, along with multiple student models, to address the CF issue. The proposed method consists of two phases: Forward Knowledge Distillation, where the teacher learns new clusters while retaining knowledge from previous tasks with guidance from specialized student models, and Backward Knowledge Distillation, where a student model mimics the teacher's behavior to retain task-specific knowledge, aiding the teacher in subsequent tasks. FBCC marks a pioneering approach to UCC, demonstrating enhanced performance and memory efficiency in clustering across various tasks, outperforming the application of clustering algorithms to the latent space of state-of-the-art UCL algorithms.
