MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartłomiej Twardowski, Tomasz Trzciński, Sebastian Cygert
TL;DR
The paper tackles continual learning with large pre-trained models by addressing catastrophic forgetting through a merging-based paradigm. It introduces MagMax, which sequentially fine-tunes on new tasks and then consolidates knowledge by selecting the maximum-magnitude parameter updates across task-specific vectors, yielding $\theta_{MagMax} = \theta_0 + \lambda \tau_{MagMax}$ where $\tau_{MagMax}^p = \tau_k^p$ with $k = \arg\max_i |\tau_i^p|$. Through extensive benchmarking across class- and domain-incremental settings, MagMax achieves state-of-the-art results on multiple benchmarks and reveals that simple baselines (e.g., averaging or random mixing) are unexpectedly strong in certain regimes. The study also provides deep insights into the role of update magnitude, sign-consistency, and task-vector contributions, while showing that sequential fine-tuning enhances other merging methods and that a fixed scaling factor $\lambda$ is largely robust. Overall, MagMax demonstrates a practical, memory-efficient route to robust continual learning for large pre-trained models with broad implications for improving open-vocabulary and cross-domain adaptation.
Abstract
This paper introduces a continual learning approach named MagMax, which utilizes model merging to enable large pre-trained models to continuously learn from new data without forgetting previously acquired knowledge. Distinct from traditional continual learning methods that aim to reduce forgetting during task training, MagMax combines sequential fine-tuning with a maximum magnitude weight selection for effective knowledge integration across tasks. Our initial contribution is an extensive examination of model merging techniques, revealing that simple approaches like weight averaging and random weight selection surprisingly hold up well in various continual learning contexts. More importantly, we present MagMax, a novel model-merging strategy that enables continual learning of large pre-trained models for successive tasks. Our thorough evaluation demonstrates the superiority of MagMax in various scenarios, including class- and domain-incremental learning settings. The code is available at this URL: https://github.com/danielm1405/magmax.
