Learning with Preserving for Continual Multitask Learning

Hanchen David Wang; Siwoo Bae; Zirong Chen; Meiyi Ma

Learning with Preserving for Continual Multitask Learning

Hanchen David Wang, Siwoo Bae, Zirong Chen, Meiyi Ma

TL;DR

This work defines Continual Multitask Learning (CMTL), where tasks arrive sequentially on a shared input domain and labels for past tasks are not fully available. It introduces Learning with Preserving (LwP), a replay-free framework that preserves the geometry of the latent representation via a Dynamically Weighted Distance Preservation (DWDP) loss, complemented by current-task supervision and distillation of past tasks. By maintaining pairwise distances within intra-task pairs and using a dynamic mask to avoid inter-class conflicts, LwP mitigates catastrophic forgetting while enabling knowledge sharing across tasks, and it demonstrates superior performance and robustness on time-series and image benchmarks, including non-stationary distributions. The approach does not require data replay, making it particularly suitable for privacy-sensitive scenarios, and it achieves state-of-the-art results across multiple datasets, often surpassing single-task baselines.

Abstract

Artificial intelligence systems in critical fields like autonomous driving and medical imaging analysis often continually learn new tasks using a shared stream of input data. For instance, after learning to detect traffic signs, a model may later need to learn to classify traffic lights or different types of vehicles using the same camera feed. This scenario introduces a challenging setting we term Continual Multitask Learning (CMTL), where a model sequentially learns new tasks on an underlying data distribution without forgetting previously learned abilities. Existing continual learning methods often fail in this setting because they learn fragmented, task-specific features that interfere with one another. To address this, we introduce Learning with Preserving (LwP), a novel framework that shifts the focus from preserving task outputs to maintaining the geometric structure of the shared representation space. The core of LwP is a Dynamically Weighted Distance Preservation (DWDP) loss that prevents representation drift by regularizing the pairwise distances between latent data representations. This mechanism of preserving the underlying geometric structure allows the model to retain implicit knowledge and support diverse tasks without requiring a replay buffer, making it suitable for privacy-conscious applications. Extensive evaluations on time-series and image benchmarks show that LwP not only mitigates catastrophic forgetting but also consistently outperforms state-of-the-art baselines in CMTL tasks. Notably, our method shows superior robustness to distribution shifts and is the only approach to surpass the strong single-task learning baseline, underscoring its effectiveness for real-world dynamic environments.

Learning with Preserving for Continual Multitask Learning

TL;DR

Abstract

Learning with Preserving for Continual Multitask Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)