Table of Contents
Fetching ...

Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery

Grzegorz Rypeść, Daniel Marczak, Sebastian Cygert, Tomasz Trzciński, Bartłomiej Twardowski

TL;DR

This work addresses Generalized Continual Category Discovery (GCCD), a setting requiring learning from sequential, partially labeled data while discovering novel categories. It introduces CAMP, a method that pairs projected distillation via a learnable projector with a centroid-drift predictor (an auxiliary category adaptation network) to model and compensate for past-class drift, enabling robust plasticity and stability without exemplars. CAMP demonstrates state-of-the-art performance across GCCD and exemplar-free Class Incremental Learning on multiple datasets, with analyses showing the benefits of combining projection-based distillation and centroid adaptation and clarifying the roles of adapters and distillers. The approach offers a practical, scalable solution for continual learning with partially labeled data and evolving category distributions, albeit with limitations for non-centroid-based representations and exemplar-free scalability in certain settings.

Abstract

Generalized Continual Category Discovery (GCCD) tackles learning from sequentially arriving, partially labeled datasets while uncovering new categories. Traditional methods depend on feature distillation to prevent forgetting the old knowledge. However, this strategy restricts the model's ability to adapt and effectively distinguish new categories. To address this, we introduce a novel technique integrating a learnable projector with feature distillation, thus enhancing model adaptability without sacrificing past knowledge. The resulting distribution shift of the previously learned categories is mitigated with the auxiliary category adaptation network. We demonstrate that while each component offers modest benefits individually, their combination - dubbed CAMP (Category Adaptation Meets Projected distillation) - significantly improves the balance between learning new information and retaining old. CAMP exhibits superior performance across several GCCD and Class Incremental Learning scenarios. The code is available at https://github.com/grypesc/CAMP.

Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery

TL;DR

This work addresses Generalized Continual Category Discovery (GCCD), a setting requiring learning from sequential, partially labeled data while discovering novel categories. It introduces CAMP, a method that pairs projected distillation via a learnable projector with a centroid-drift predictor (an auxiliary category adaptation network) to model and compensate for past-class drift, enabling robust plasticity and stability without exemplars. CAMP demonstrates state-of-the-art performance across GCCD and exemplar-free Class Incremental Learning on multiple datasets, with analyses showing the benefits of combining projection-based distillation and centroid adaptation and clarifying the roles of adapters and distillers. The approach offers a practical, scalable solution for continual learning with partially labeled data and evolving category distributions, albeit with limitations for non-centroid-based representations and exemplar-free scalability in certain settings.

Abstract

Generalized Continual Category Discovery (GCCD) tackles learning from sequentially arriving, partially labeled datasets while uncovering new categories. Traditional methods depend on feature distillation to prevent forgetting the old knowledge. However, this strategy restricts the model's ability to adapt and effectively distinguish new categories. To address this, we introduce a novel technique integrating a learnable projector with feature distillation, thus enhancing model adaptability without sacrificing past knowledge. The resulting distribution shift of the previously learned categories is mitigated with the auxiliary category adaptation network. We demonstrate that while each component offers modest benefits individually, their combination - dubbed CAMP (Category Adaptation Meets Projected distillation) - significantly improves the balance between learning new information and retaining old. CAMP exhibits superior performance across several GCCD and Class Incremental Learning scenarios. The code is available at https://github.com/grypesc/CAMP.
Paper Structure (20 sections, 15 equations, 13 figures, 6 tables)

This paper contains 20 sections, 15 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: While knowledge distillation through a projector and category adaption are decent on their own, CAMP combines them to achieve great results without the need of exemplars.
  • Figure 1: Average accuracy after each task on three datasets. CAMP achieves the best accuracy after most of tasks.
  • Figure 2: CAMP utilizes a projected knowledge distillation, resulting in a predictable latent space drift. The drift is revertible via centroid adaptation (black arrows), maintaining high performance on the first task. GCD and GCD with feature distillation fail to prevent forgetting, resulting in a drift that is difficult to predict. This decreases their performance. We report the nearest centroid classification accuracy using stored (Acc old) and adapted centroids (Acc adapted) after training on the second task.
  • Figure 2: Impact of $\beta$ hyperparametrer on known and novel accuracy achieved on CUB200.
  • Figure 3: The training procedure of CAMP consists of three stages: (1) We train the feature extractor in a semi-supervised manner using distillation through a learnable projector. (2) We obtain centroids of known and novel classes in the current task using constrastive K-Means algorithm. (3) We update memorized centroids of old categories to alleviate forgetting.
  • ...and 8 more figures