Table of Contents
Fetching ...

Toward a Holistic Approach to Continual Model Merging

Hoang Phan, Sungmin Cha, Tung Lam Tran, Qi Lei

TL;DR

This work presents a holistic framework for Continual Model Merging that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning and provides a scalable and efficient solution to the catastrophic forgetting problem.

Abstract

We present a holistic framework for Continual Model Merging (CMM) that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning. In particular, conventional approaches either maintain a growing list of per-domain task vectors, leading to scalability issues or rely solely on weight-space merging when old data is inaccessible, thereby losing crucial functional information. Our method overcomes these limitations by first fine-tuning the main model within its tangent space on domain-specific data; this linearization amplifies per-task weight disentanglement, effectively mitigating across-task interference. During merging, we leverage functional information from available optimizer states beyond mere parameter averages to avoid the need to revisit old data. Finally, a post-merging correction aligns the representation discrepancy between pre- and post-merged models, reducing bias and enhancing overall performance-all while operating under constant memory constraints without accessing historical data. Extensive experiments on standard class-incremental and domain-incremental benchmarks demonstrate that our approach not only achieves competitive performance but also provides a scalable and efficient solution to the catastrophic forgetting problem.

Toward a Holistic Approach to Continual Model Merging

TL;DR

This work presents a holistic framework for Continual Model Merging that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning and provides a scalable and efficient solution to the catastrophic forgetting problem.

Abstract

We present a holistic framework for Continual Model Merging (CMM) that intervenes at three critical stages: pre-merging, during merging, and post-merging-to address two fundamental challenges in continual learning. In particular, conventional approaches either maintain a growing list of per-domain task vectors, leading to scalability issues or rely solely on weight-space merging when old data is inaccessible, thereby losing crucial functional information. Our method overcomes these limitations by first fine-tuning the main model within its tangent space on domain-specific data; this linearization amplifies per-task weight disentanglement, effectively mitigating across-task interference. During merging, we leverage functional information from available optimizer states beyond mere parameter averages to avoid the need to revisit old data. Finally, a post-merging correction aligns the representation discrepancy between pre- and post-merged models, reducing bias and enhancing overall performance-all while operating under constant memory constraints without accessing historical data. Extensive experiments on standard class-incremental and domain-incremental benchmarks demonstrate that our approach not only achieves competitive performance but also provides a scalable and efficient solution to the catastrophic forgetting problem.

Paper Structure

This paper contains 21 sections, 17 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Average accuracy score and loss value when interpolating models after training on first and second tasks, using linear averaging and our proposed merging method.
  • Figure 2: Average accuracy score and loss value along the linear path between models after training on first and second tasks using normal training and our proposed linear fine-tuning method.
  • Figure 3: Mismatch between the feature representations of models before merging and after merging (left) or between models before merging and after merging + refinement.
  • Figure 4: Average accuracy score and loss value when using our merging method in interpolating models after training on first and second tasks using normal training and our proposed linearly fine-tuning method.
  • Figure 5: t-SNE of samples from the 12-th class using models from the previous task $\circ$ and current task $\triangle$.
  • ...and 2 more figures