Table of Contents
Fetching ...

Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning

Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye, Lijun Zhang, De-Chuan Zhan

TL;DR

Domain-Incremental Learning with pre-trained models is hampered by forgetting in both features and the classifier as domains shift. Duct tackles this with two coordinated strategies: representation consolidation, which builds a unified embedding by merging task vectors from historical backbones weighted by task similarity, and classifier consolidation, which realigns old classifiers to the consolidated space via optimal transport guided by class-wise semantic costs. The method uses a streamlined, exemplar-free setup and maintains only two backbones, achieving state-of-the-art results on four benchmarks and showing robustness across task orders and backbones. This dual consolidation enables stable, scalable continual adaptation of PTMs in dynamic environments. Key equations include φ^m_i = φ_0 + α_φ ∑_{k=1}^{i} Sim_{0,k} δφ_k and W_o^m = (1 − α_W) W_o + α_W W_n T with T obtained from an OT problem using costs Q_{i,j} = || c^0_i − c^0_j ||^2.

Abstract

Domain-Incremental Learning (DIL) involves the progressive adaptation of a model to new concepts across different domains. While recent advances in pre-trained models provide a solid foundation for DIL, learning new concepts often results in the catastrophic forgetting of pre-trained knowledge. Specifically, sequential model updates can overwrite both the representation and the classifier with knowledge from the latest domain. Thus, it is crucial to develop a representation and corresponding classifier that accommodate all seen domains throughout the learning process. To this end, we propose DUal ConsolidaTion (Duct) to unify and consolidate historical knowledge at both the representation and classifier levels. By merging the backbone of different stages, we create a representation space suitable for multiple domains incrementally. The merged representation serves as a balanced intermediary that captures task-specific features from all seen domains. Additionally, to address the mismatch between consolidated embeddings and the classifier, we introduce an extra classifier consolidation process. Leveraging class-wise semantic information, we estimate the classifier weights of old domains within the latest embedding space. By merging historical and estimated classifiers, we align them with the consolidated embedding space, facilitating incremental classification. Extensive experimental results on four benchmark datasets demonstrate Duct's state-of-the-art performance. Code is available at https://github.com/Estrella-fugaz/CVPR25-Duct

Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning

TL;DR

Domain-Incremental Learning with pre-trained models is hampered by forgetting in both features and the classifier as domains shift. Duct tackles this with two coordinated strategies: representation consolidation, which builds a unified embedding by merging task vectors from historical backbones weighted by task similarity, and classifier consolidation, which realigns old classifiers to the consolidated space via optimal transport guided by class-wise semantic costs. The method uses a streamlined, exemplar-free setup and maintains only two backbones, achieving state-of-the-art results on four benchmarks and showing robustness across task orders and backbones. This dual consolidation enables stable, scalable continual adaptation of PTMs in dynamic environments. Key equations include φ^m_i = φ_0 + α_φ ∑_{k=1}^{i} Sim_{0,k} δφ_k and W_o^m = (1 − α_W) W_o + α_W W_n T with T obtained from an OT problem using costs Q_{i,j} = || c^0_i − c^0_j ||^2.

Abstract

Domain-Incremental Learning (DIL) involves the progressive adaptation of a model to new concepts across different domains. While recent advances in pre-trained models provide a solid foundation for DIL, learning new concepts often results in the catastrophic forgetting of pre-trained knowledge. Specifically, sequential model updates can overwrite both the representation and the classifier with knowledge from the latest domain. Thus, it is crucial to develop a representation and corresponding classifier that accommodate all seen domains throughout the learning process. To this end, we propose DUal ConsolidaTion (Duct) to unify and consolidate historical knowledge at both the representation and classifier levels. By merging the backbone of different stages, we create a representation space suitable for multiple domains incrementally. The merged representation serves as a balanced intermediary that captures task-specific features from all seen domains. Additionally, to address the mismatch between consolidated embeddings and the classifier, we introduce an extra classifier consolidation process. Leveraging class-wise semantic information, we estimate the classifier weights of old domains within the latest embedding space. By merging historical and estimated classifiers, we align them with the consolidated embedding space, facilitating incremental classification. Extensive experimental results on four benchmark datasets demonstrate Duct's state-of-the-art performance. Code is available at https://github.com/Estrella-fugaz/CVPR25-Duct
Paper Structure (27 sections, 10 equations, 9 figures, 15 tables, 1 algorithm)

This paper contains 27 sections, 10 equations, 9 figures, 15 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of Duct. Top: Representation consolidation. We utilize the pre-trained model as initialization and optimize it for each domain, obtaining the task vectors. Afterward, we combine the pre-trained model and all seen task vectors to build the unified embedding space. Bottom: Classifier consolidation. To align the classifiers with consolidated features, we design the new classifier retraining and old classifier transport to consolidate classifiers. Class-wise semantic information is utilized in classifier transport.
  • Figure 2: Incremental performance of different methods with the same pre-trained model. We report the performance gap after the last incremental stage between Duct and the runner-up method at the end of the line.
  • Figure 2: Ablation study on differnt modules in Duct.
  • Figure 3: Further analysis on multiple task orders, forgetting measure, and parameter robustness. (a): Incremental performance of different methods on CORe50 with five task orders. The shadow indicates standard deviation. (b): Forgetting measure ( lower is better) of different methods on CDDB dataset among five task orders. Duct shows the least forgetting among all methods. (c): Average incremental performance with change of the consolidation ratios.
  • Figure 4: Before Duct.
  • ...and 4 more figures