Table of Contents
Fetching ...

Enhancing Visual Continual Learning with Language-Guided Supervision

Bolin Ni, Hongbo Zhao, Chenghao Zhang, Ke Hu, Gaofeng Meng, Zhaoxiang Zhang, Shiming Xiang

TL;DR

This work targets catastrophic forgetting in continual learning by exploiting semantic knowledge embedded in class names. It introduces LingoCL, which uses pretrained language models to generate semantic targets for each class and freezes the classifier head to guide the visual encoder, enabling cross-task semantic alignment. Empirically, LingoCL yields consistent gains across class-incremental, task-incremental, few-shot, and domain-incremental settings, while reducing forgetting and sometimes producing backward transfer. The method is simple, computation-efficient, and orthogonal to many existing CL approaches, suggesting broad applicability for robust, semantically informed continual vision systems.

Abstract

Continual learning (CL) aims to empower models to learn new tasks without forgetting previously acquired knowledge. Most prior works concentrate on the techniques of architectures, replay data, regularization, \etc. However, the category name of each class is largely neglected. Existing methods commonly utilize the one-hot labels and randomly initialize the classifier head. We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks. In this paper, we revisit the role of the classifier head within the CL paradigm and replace the classifier with semantic knowledge from pretrained language models (PLMs). Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals during training. Such targets fully consider the semantic correlation between all classes across tasks. Empirical studies show that our approach mitigates forgetting by alleviating representation drifting and facilitating knowledge transfer across tasks. The proposed method is simple to implement and can seamlessly be plugged into existing methods with negligible adjustments. Extensive experiments based on eleven mainstream baselines demonstrate the effectiveness and generalizability of our approach to various protocols. For example, under the class-incremental learning setting on ImageNet-100, our method significantly improves the Top-1 accuracy by 3.2\% to 6.1\% while reducing the forgetting rate by 2.6\% to 13.1\%.

Enhancing Visual Continual Learning with Language-Guided Supervision

TL;DR

This work targets catastrophic forgetting in continual learning by exploiting semantic knowledge embedded in class names. It introduces LingoCL, which uses pretrained language models to generate semantic targets for each class and freezes the classifier head to guide the visual encoder, enabling cross-task semantic alignment. Empirically, LingoCL yields consistent gains across class-incremental, task-incremental, few-shot, and domain-incremental settings, while reducing forgetting and sometimes producing backward transfer. The method is simple, computation-efficient, and orthogonal to many existing CL approaches, suggesting broad applicability for robust, semantically informed continual vision systems.

Abstract

Continual learning (CL) aims to empower models to learn new tasks without forgetting previously acquired knowledge. Most prior works concentrate on the techniques of architectures, replay data, regularization, \etc. However, the category name of each class is largely neglected. Existing methods commonly utilize the one-hot labels and randomly initialize the classifier head. We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks. In this paper, we revisit the role of the classifier head within the CL paradigm and replace the classifier with semantic knowledge from pretrained language models (PLMs). Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals during training. Such targets fully consider the semantic correlation between all classes across tasks. Empirical studies show that our approach mitigates forgetting by alleviating representation drifting and facilitating knowledge transfer across tasks. The proposed method is simple to implement and can seamlessly be plugged into existing methods with negligible adjustments. Extensive experiments based on eleven mainstream baselines demonstrate the effectiveness and generalizability of our approach to various protocols. For example, under the class-incremental learning setting on ImageNet-100, our method significantly improves the Top-1 accuracy by 3.2\% to 6.1\% while reducing the forgetting rate by 2.6\% to 13.1\%.
Paper Structure (26 sections, 3 equations, 10 figures, 9 tables)

This paper contains 26 sections, 3 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: We introduce LingoCL, a simple yet effective continual learning paradigm leveraging language-guided supervision, which can be integrated into most existing approaches seamlessly. (a) Overview of the typical methods which are supervised only by one-hot labels. (b) Overview of the proposed LingoCL which is supervised by semantic targets generated from the pretrained language model. (c) LingoCL is versatile, which significantly enhances the performance of mainstream methods in class-, task- and domain-incremental scenarios.
  • Figure 2: Comparison of the inter-class correlation maps. LingoCL facilitates more efficient knowledge transfer among similar classes.
  • Figure 3: Quantitative analysis of representation drifting on ImageNet-100 with 10 tasks. LingoCL effectively alleviates the representation drifting in the CL process.
  • Figure 4: The evolution curve of accuracy and forgetting rate for each task on class-incremental experiments on ImageNet-100. Significantly, LingoCL exhibits negative forgetting, i.e., the learning of subsequent tasks leads to improved performance on prior tasks. This phenomenon evidences LingoCL's effective facilitation of knowledge transfer.
  • Figure 5: Results on general few-shot class-incremental learning.
  • ...and 5 more figures