Table of Contents
Fetching ...

Overcoming Domain Drift in Online Continual Learning

Fan Lyu, Daofeng Liu, Linglan Zhao, Zhang Zhang, Fanhua Shang, Fuyuan Hu, Wei Feng, Liang Wang

TL;DR

This work tackles online continual learning (OCL) where models must learn from a stream of tasks without retraining on past data, a setting prone to catastrophic forgetting due to continual domain drift. It introduces Drift-Reducing Rehearsal (DRR), a rehearsal-based approach that anchors old-task domains using Centroid-based Online Selection (COS) and a Cross-Task Contrastive Margin Loss (CML), with an optional Centroid Distillation Loss (CDL) to further stabilize the feature space. DRR integrates a two-level angular margin framework ($m^\text{c}$, $m^\text{t}$) to tighten intra-class/task clusters while expanding inter-class/task separations, thereby reducing negative transfer between tasks. Empirical results on four standard OCL benchmarks show that DRR achieves state-of-the-art performance, effectively mitigating continual domain drift and preserving knowledge across tasks while maintaining competitive training efficiency. This approach offers a scalable and data-efficient solution for online continual learning in dynamic environments.

Abstract

Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential learning tasks may entail the gradual displacement of the decision boundaries in the learned feature space, rendering the learned knowledge susceptible to forgetting. To address the above problem, in this paper, we propose a novel rehearsal strategy, termed Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects. First, we propose to select memory for more representative samples guided by constructed centroids in a data stream. Then, to keep the model from domain chaos in drifting, a two-level angular cross-task Contrastive Margin Loss (CML) is proposed, to encourage the intra-class and intra-task compactness, and increase the inter-class and inter-task discrepancy. Finally, to further suppress the continual domain drift, we present an optional Centorid Distillation Loss (CDL) on the rehearsal memory to anchor the knowledge in feature space for each previous old task. Extensive experimental results on four benchmark datasets validate that the proposed DRR can effectively mitigate the continual domain drift and achieve the state-of-the-art (SOTA) performance in OCL.

Overcoming Domain Drift in Online Continual Learning

TL;DR

This work tackles online continual learning (OCL) where models must learn from a stream of tasks without retraining on past data, a setting prone to catastrophic forgetting due to continual domain drift. It introduces Drift-Reducing Rehearsal (DRR), a rehearsal-based approach that anchors old-task domains using Centroid-based Online Selection (COS) and a Cross-Task Contrastive Margin Loss (CML), with an optional Centroid Distillation Loss (CDL) to further stabilize the feature space. DRR integrates a two-level angular margin framework (, ) to tighten intra-class/task clusters while expanding inter-class/task separations, thereby reducing negative transfer between tasks. Empirical results on four standard OCL benchmarks show that DRR achieves state-of-the-art performance, effectively mitigating continual domain drift and preserving knowledge across tasks while maintaining competitive training efficiency. This approach offers a scalable and data-efficient solution for online continual learning in dynamic environments.

Abstract

Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential learning tasks may entail the gradual displacement of the decision boundaries in the learned feature space, rendering the learned knowledge susceptible to forgetting. To address the above problem, in this paper, we propose a novel rehearsal strategy, termed Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects. First, we propose to select memory for more representative samples guided by constructed centroids in a data stream. Then, to keep the model from domain chaos in drifting, a two-level angular cross-task Contrastive Margin Loss (CML) is proposed, to encourage the intra-class and intra-task compactness, and increase the inter-class and inter-task discrepancy. Finally, to further suppress the continual domain drift, we present an optional Centorid Distillation Loss (CDL) on the rehearsal memory to anchor the knowledge in feature space for each previous old task. Extensive experimental results on four benchmark datasets validate that the proposed DRR can effectively mitigate the continual domain drift and achieve the state-of-the-art (SOTA) performance in OCL.
Paper Structure (24 sections, 23 equations, 7 figures, 8 tables, 2 algorithms)

This paper contains 24 sections, 23 equations, 7 figures, 8 tables, 2 algorithms.

Figures (7)

  • Figure 1: (a) Task of OCL. (b) Continual domain drift in OCL. Through continual training, old tasks have their domain distribution drift in an unpredictable direction, and the decision boundaries between old and new tasks could be blurred. Best view in color.
  • Figure 2: Drift comparison in rehearsal-based OCL. (a) and (b) show the two kinds of domain drift in rehearsal-based CL. Because of the unrepresentative stored data and the gradient bias, the features of old and new tasks are entangled thus causing catastrophic forgetting. (c) Our DRR seeks to store representative data via centroids and constrain the drift via the proposed cross-task contrastive margin loss.
  • Figure 3: Training procedure of the proposed DRR in continual learning on a data stream. At each step, we store a small number of samples and the corresponding latent representations via centroid-based rehearsal. The cross-task margin loss guarantees the intra-class/task compactness and inter-class/task discrepancy. The centroid distillation loss helps further to reduce the continual domain drift of the old tasks. The dashed elements mean the optional items.
  • Figure 4: Centroid-based online selection. For each data point on a data stream, we first compare it with existing centroids if it satisfies the threshold limit. Left: If the data point is captured by a centroid with the smallest allowed distance, the target centroid will be updated. Right: Then, we reevaluate the distance to each centroid, and replace the farthest point with the new-captured sample in the memory buffer.
  • Figure 5: Average accuracy of all tasks when they just finish their training.
  • ...and 2 more figures