Table of Contents
Fetching ...

Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method

Kishaan Jeeveswaran, Elahe Arani, Bahram Zonooz

TL;DR

This work tackles domain incremental learning by introducing DARE, a three-stage training protocol (Divergence, Adaptation, Refinement) that gradually molds new-domain representations into the subspace defined by prior domains, thereby reducing representation drift and catastrophic forgetting. It combines dual classifiers with cross-entropy and supervised contrastive objectives and leverages an Intermediary Reservoir Sampling buffer to preserve dark knowledge across tasks. Empirical results on DN4IL and iCIFAR-20 show that DARE and its EMA variant consistently outperform strong rehearsal-based baselines, especially in memory-constrained settings, and analyses confirm reduced drift, better task balancing, and improved calibration. The approach offers a practical, memory-efficient path for robust DIL in real-world, shifting-domain environments, though it currently relies on task-id information for IRS and could benefit from automatic task-transition detection in future work.

Abstract

Domain incremental learning (DIL) poses a significant challenge in real-world scenarios, as models need to be sequentially trained on diverse domains over time, all the while avoiding catastrophic forgetting. Mitigating representation drift, which refers to the phenomenon of learned representations undergoing changes as the model adapts to new tasks, can help alleviate catastrophic forgetting. In this study, we propose a novel DIL method named DARE, featuring a three-stage training process: Divergence, Adaptation, and REfinement. This process gradually adapts the representations associated with new tasks into the feature space spanned by samples from previous tasks, simultaneously integrating task-specific decision boundaries. Additionally, we introduce a novel strategy for buffer sampling and demonstrate the effectiveness of our proposed method, combined with this sampling strategy, in reducing representation drift within the feature encoder. This contribution effectively alleviates catastrophic forgetting across multiple DIL benchmarks. Furthermore, our approach prevents sudden representation drift at task boundaries, resulting in a well-calibrated DIL model that maintains the performance on previous tasks.

Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method

TL;DR

This work tackles domain incremental learning by introducing DARE, a three-stage training protocol (Divergence, Adaptation, Refinement) that gradually molds new-domain representations into the subspace defined by prior domains, thereby reducing representation drift and catastrophic forgetting. It combines dual classifiers with cross-entropy and supervised contrastive objectives and leverages an Intermediary Reservoir Sampling buffer to preserve dark knowledge across tasks. Empirical results on DN4IL and iCIFAR-20 show that DARE and its EMA variant consistently outperform strong rehearsal-based baselines, especially in memory-constrained settings, and analyses confirm reduced drift, better task balancing, and improved calibration. The approach offers a practical, memory-efficient path for robust DIL in real-world, shifting-domain environments, though it currently relies on task-id information for IRS and could benefit from automatic task-transition detection in future work.

Abstract

Domain incremental learning (DIL) poses a significant challenge in real-world scenarios, as models need to be sequentially trained on diverse domains over time, all the while avoiding catastrophic forgetting. Mitigating representation drift, which refers to the phenomenon of learned representations undergoing changes as the model adapts to new tasks, can help alleviate catastrophic forgetting. In this study, we propose a novel DIL method named DARE, featuring a three-stage training process: Divergence, Adaptation, and REfinement. This process gradually adapts the representations associated with new tasks into the feature space spanned by samples from previous tasks, simultaneously integrating task-specific decision boundaries. Additionally, we introduce a novel strategy for buffer sampling and demonstrate the effectiveness of our proposed method, combined with this sampling strategy, in reducing representation drift within the feature encoder. This contribution effectively alleviates catastrophic forgetting across multiple DIL benchmarks. Furthermore, our approach prevents sudden representation drift at task boundaries, resulting in a well-calibrated DIL model that maintains the performance on previous tasks.

Paper Structure

This paper contains 23 sections, 6 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Relationship between representation drift and task 1 accuracy on DN4IL dataset with buffer size 50. The representations of buffered samples, mainly belonging to the first domain, experience an abrupt drift at the task boundary, which is directly associated with the decrease in accuracy.
  • Figure 2: Our proposed method, DARE, assimilates the knowledge about the new task while preserving the representations from earlier tasks by adopting a three-stage learning process in DIL. In the first two stages, Divergence and Adaptation, the model learns the representations of new domains within the cluster of old ones (rather than the other way around, which can exacerbate catastrophic forgetting). The final stage, Refinement, helps the model learn the new domain samples.
  • Figure 3: Task-wise accuracy of different CL models while learning new tasks with buffer size 50. DARE retains more performance on seen domains compared to ER and DER++.
  • Figure 4: Representation drift analysis. Left: Epoch-wise accuracy on Task 1 samples, while learning future tasks (shaded regions indicate new tasks). Right: Iteration-wise drifts for buffered samples for CL methods trained with a buffer size of 50. It is evident that DARE effectively reduces representation drift compared to other methods.
  • Figure 5: Model calibration and task recency bias analyses of different CL approaches learned with buffer size 200. Left: Logit norm analysis shows that DARE predicts logits with magnitudes smaller than DER++ (less overconfident) for recent task samples. Right: DARE has a lower calibration error compared to DER++ on samples belonging to different tasks.
  • ...and 2 more figures