Table of Contents
Fetching ...

Proxy-Anchor and EVT-Driven Continual Learning Method for Generalized Category Discovery

Alireza Fathalizadeh, Roozbeh Razavi-Far

TL;DR

This work tackles Continual Generalized Category Discovery (CGCD) by introducing EVT-guided boundaries around proxy anchors and a dedicated EVT-based loss to simultaneously reject unknowns and improve representation. The method, named CATEGORIZER, comprises an Initial Stage that leverages Proxy Anchor loss followed by Weibull-based EVT analysis to define a probabilistic inclusion boundary and a subsequent $\,\ell_{evt}$ loss, and a Continual Learning Stage that performs novelty detection, clustering of unknowns into novel classes, and incremental proxy updates with memory and distillation while applying model reduction to discard redundant proxies. Key contributions include the EVT-based loss for representation learning, a threshold-based novelty detection and clustering pipeline, and a greedy set-cover proxy reduction to mitigate overestimation of novel categories. Experiments on multiple fine-grained datasets demonstrate superior performance over SOTA CGCD methods, validating the approach's effectiveness in mitigating forgetting while enabling robust discovery. The work has practical impact for real-world systems requiring open-set recognition and continual adaptation without extensive labeled data. Future work includes integrating the EVT loss into the continual stage and exploring alternative clustering strategies.

Abstract

Continual generalized category discovery has been introduced and studied in the literature as a method that aims to continuously discover and learn novel categories in incoming data batches while avoiding catastrophic forgetting of previously learned categories. A key component in addressing this challenge is the model's ability to separate novel samples, where Extreme Value Theory (EVT) has been effectively employed. In this work, we propose a novel method that integrates EVT with proxy anchors to define boundaries around proxies using a probability of inclusion function, enabling the rejection of unknown samples. Additionally, we introduce a novel EVT-based loss function to enhance the learned representation, achieving superior performance compared to other deep-metric learning methods in similar settings. Using the derived probability functions, novel samples are effectively separated from previously known categories. However, category discovery within these novel samples can sometimes overestimate the number of new categories. To mitigate this issue, we propose a novel EVT-based approach to reduce the model size and discard redundant proxies. We also incorporate experience replay and knowledge distillation mechanisms during the continual learning stage to prevent catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms state-of-the-art methods in continual generalized category discovery scenarios.

Proxy-Anchor and EVT-Driven Continual Learning Method for Generalized Category Discovery

TL;DR

This work tackles Continual Generalized Category Discovery (CGCD) by introducing EVT-guided boundaries around proxy anchors and a dedicated EVT-based loss to simultaneously reject unknowns and improve representation. The method, named CATEGORIZER, comprises an Initial Stage that leverages Proxy Anchor loss followed by Weibull-based EVT analysis to define a probabilistic inclusion boundary and a subsequent loss, and a Continual Learning Stage that performs novelty detection, clustering of unknowns into novel classes, and incremental proxy updates with memory and distillation while applying model reduction to discard redundant proxies. Key contributions include the EVT-based loss for representation learning, a threshold-based novelty detection and clustering pipeline, and a greedy set-cover proxy reduction to mitigate overestimation of novel categories. Experiments on multiple fine-grained datasets demonstrate superior performance over SOTA CGCD methods, validating the approach's effectiveness in mitigating forgetting while enabling robust discovery. The work has practical impact for real-world systems requiring open-set recognition and continual adaptation without extensive labeled data. Future work includes integrating the EVT loss into the continual stage and exploring alternative clustering strategies.

Abstract

Continual generalized category discovery has been introduced and studied in the literature as a method that aims to continuously discover and learn novel categories in incoming data batches while avoiding catastrophic forgetting of previously learned categories. A key component in addressing this challenge is the model's ability to separate novel samples, where Extreme Value Theory (EVT) has been effectively employed. In this work, we propose a novel method that integrates EVT with proxy anchors to define boundaries around proxies using a probability of inclusion function, enabling the rejection of unknown samples. Additionally, we introduce a novel EVT-based loss function to enhance the learned representation, achieving superior performance compared to other deep-metric learning methods in similar settings. Using the derived probability functions, novel samples are effectively separated from previously known categories. However, category discovery within these novel samples can sometimes overestimate the number of new categories. To mitigate this issue, we propose a novel EVT-based approach to reduce the model size and discard redundant proxies. We also incorporate experience replay and knowledge distillation mechanisms during the continual learning stage to prevent catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms state-of-the-art methods in continual generalized category discovery scenarios.

Paper Structure

This paper contains 21 sections, 14 equations, 4 figures, 8 tables, 3 algorithms.

Figures (4)

  • Figure 1: The general presentation of the Continual Generalized Category Discovery (CGCD) setting. In the initial stage, a labeled dataset is provided to train the initial model. After the initial stage, the model enters the continual learning stage, in which no labeled data is provided. The input data in this stage can contain samples belonging to novel or previously known categories. The model is expected to discover potential novel categories in this unlabeled data and integrate them into the model without compromising the performance of previous categories and making assumptions about the number of novel categories.
  • Figure 2: Overview of CATEGORIZER. In the initial stage, the model is first pre-trained on PA loss to derive proxy anchors for different classes. Following this, the EVT analysis is applied to each proxy to compute the Weibull distribution around each proxy and devising a probability of inclusion (PSI) function that is capable of rejecting unknown samples. With the computed distributions, we fine-tune the model on our novel $evt$ loss to get the initial model. In the continual learning stage, the input data containing novel and known samples are separated by thresholding PSIs functions computed in the initial stage into known and unknown samples. Known samples are pseudo-labeled using the current model from the previous step and unknown samples are clustered. The model is updated using pseudo-labeled and clustered data, exemplars of previous categories, and distillation loss derived from the previous step model. EVT is applied to the updated model to get updated distribution, where the model is reduced and redundant proxies are discarded. This process repeats for the next steps. The yellow boxes indicate our novel contribution in the proposed scheme.
  • Figure 3: Clustering accuracy versus epoch number in the continual learning stage over all runs. In the Dogs and Cars datasets, training for longer epochs has led to feature collapse of newly discovered classes, while in CUB and MIT datasets, the accuracy has not changed after certain epochs. Based on this observation we limit the number of epochs of the model training in the continual learning stage.
  • Figure 4: Performance in Recall@1 versus epoch number for fine-tuning on our $evt$ loss over all runs. The blue dotted line indicates PA accuracy, i.e., the initial accuracy of the model. The accuracy drops at the beginning of training, bouncing back after a few epochs across different datasets, converging after 10-20 epochs and achieving higher performance compared to that of PA.