Table of Contents
Fetching ...

Exploiting Task Relationships in Continual Learning via Transferability-Aware Task Embeddings

Yanru Wu, Jianning Wang, Xiangyu Chen, Enming Zhang, Yang Tan, Hanbing Liu, Yang Li

TL;DR

This work tackles catastrophic forgetting in continual learning by exploiting inter-task relationships through an online, transferability-aware task embedding called $H$-embedding, guided by the information-theoretic $H$-score. A hypernetwork framework then uses these embeddings to generate task-specific weights, with an embedding-guided encoder/decoder module and Analytic Hierarchy Process normalization to align embedding distances with transferability. The approach remains storage-efficient, PEFT-friendly (e.g., LoRA), and demonstrates strong final average accuracy and robust forward/backward transfer across CIFAR-100, ImageNet-R, and DomainNet backbones. Overall, the method provides a scalable, plug-in solution that leverages prior task relations to enhance CL performance while maintaining compatibility with pretrained models. The public code further enables reproducibility and practical deployment in real-world CL systems.

Abstract

Continual learning (CL) has been a critical topic in contemporary deep neural network applications, where higher levels of both forward and backward transfer are desirable for an effective CL performance. Existing CL strategies primarily focus on task models, either by regularizing model updates or by separating task-specific and shared components, while often overlooking the potential of leveraging inter-task relationships to enhance transfer. To address this gap, we propose a transferability-aware task embedding, termed H-embedding, and construct a hypernet framework under its guidance to learn task-conditioned model weights for CL tasks. Specifically, H-embedding is derived from an information theoretic measure of transferability and is designed to be online and easy to compute. Our method is also characterized by notable practicality, requiring only the storage of a low-dimensional task embedding per task and supporting efficient end-to-end training. Extensive evaluations on benchmarks including CIFAR-100, ImageNet-R, and DomainNet show that our framework performs prominently compared to various baseline and SOTA approaches, demonstrating strong potential in capturing and utilizing intrinsic task relationships. Our code is publicly available at https://github.com/viki760/H-embedding-Guided-Hypernet.

Exploiting Task Relationships in Continual Learning via Transferability-Aware Task Embeddings

TL;DR

This work tackles catastrophic forgetting in continual learning by exploiting inter-task relationships through an online, transferability-aware task embedding called -embedding, guided by the information-theoretic -score. A hypernetwork framework then uses these embeddings to generate task-specific weights, with an embedding-guided encoder/decoder module and Analytic Hierarchy Process normalization to align embedding distances with transferability. The approach remains storage-efficient, PEFT-friendly (e.g., LoRA), and demonstrates strong final average accuracy and robust forward/backward transfer across CIFAR-100, ImageNet-R, and DomainNet backbones. Overall, the method provides a scalable, plug-in solution that leverages prior task relations to enhance CL performance while maintaining compatibility with pretrained models. The public code further enables reproducibility and practical deployment in real-world CL systems.

Abstract

Continual learning (CL) has been a critical topic in contemporary deep neural network applications, where higher levels of both forward and backward transfer are desirable for an effective CL performance. Existing CL strategies primarily focus on task models, either by regularizing model updates or by separating task-specific and shared components, while often overlooking the potential of leveraging inter-task relationships to enhance transfer. To address this gap, we propose a transferability-aware task embedding, termed H-embedding, and construct a hypernet framework under its guidance to learn task-conditioned model weights for CL tasks. Specifically, H-embedding is derived from an information theoretic measure of transferability and is designed to be online and easy to compute. Our method is also characterized by notable practicality, requiring only the storage of a low-dimensional task embedding per task and supporting efficient end-to-end training. Extensive evaluations on benchmarks including CIFAR-100, ImageNet-R, and DomainNet show that our framework performs prominently compared to various baseline and SOTA approaches, demonstrating strong potential in capturing and utilizing intrinsic task relationships. Our code is publicly available at https://github.com/viki760/H-embedding-Guided-Hypernet.

Paper Structure

This paper contains 53 sections, 11 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Illustration of the CL status on the step of learning task $j$ under our framework. The hypernet is being trained to provide the optimal task model weight $\Theta^{(j)}$ concurrently with the learning of current task embedding $e^{(j)}$, where regularization and guidance are applied using previous embeddings and H-embeddings.
  • Figure 2: Framework of our hypernet on the slice of task $j$. A hypernet (left, blue) is utilized to learn the weights of the main model (right, orange), where the H-embedding guidance is introduced using an encoder-decoder module. The entire framework is trained end-to-end by inputting task data into the main model and propagating gradients backward to update both hypernet and embedding.
  • Figure 3: Illustration of Ablation Studies (a) and CIL Performance (b). Left (a): FAA and DAA results of ablation studies (averaged across seeds). Right (b): FAA and DAA (striped) results of CIL baselines.
  • Figure 4: Plotting of test accuracy during training task 3, 6, 10 of CIFAR-100, with axis x and y for the number of checkpoints and accuracy respectively. The blue curve represents the vanilla hypernet and the orange represents our H-embedding guided hypernet. As CL progresses, our method exhibits quicker convergence to higher accuracy in later tasks.
  • Figure 5: Visualization of discrepancy between the task embedding distances learned w/ and w/o H-embedding guidance. The grid of $i$-th row and $j$-th column represents the distance of task $i$ and $j$. Darker cells indicate a larger discrepancy, with red for d(w/) < d(w/o) and blue vice versa.