BECAME: BayEsian Continual Learning with Adaptive Model MErging
Mei Li, Yuxiang Lu, Qinyan Dai, Suizhi Huang, Yue Ding, Hongtao Lu
TL;DR
The paper tackles catastrophic forgetting in continual learning by marrying gradient projection with adaptive model merging under a Bayesian lens. It proves there exists a merging point along the line between old and new task parameters that can reduce cumulative loss and derives a closed-form optimal merging coefficient via a Laplace (MAP) approximation, computable with Fisher information. The proposed two-stage BECAME framework first stabilizes learning with gradient projection and then enhances plasticity through unconstrained retraining before merging; this yields state-of-the-art performance on several CL benchmarks with improved plasticity and stable retention. Overall, the work provides a principled, generalizable mechanism to balance stability and plasticity in continual learning and offers practical guidance for integrating adaptive merging into existing CL methods.
Abstract
Continual Learning (CL) strives to learn incrementally across tasks while mitigating catastrophic forgetting. A key challenge in CL is balancing stability (retaining prior knowledge) and plasticity (learning new tasks). While representative gradient projection methods ensure stability, they often limit plasticity. Model merging techniques offer promising solutions, but prior methods typically rely on empirical assumptions and carefully selected hyperparameters. In this paper, we explore the potential of model merging to enhance the stability-plasticity trade-off, providing theoretical insights that underscore its benefits. Specifically, we reformulate the merging mechanism using Bayesian continual learning principles and derive a closed-form solution for the optimal merging coefficient that adapts to the diverse characteristics of tasks. To validate our approach, we introduce a two-stage framework named BECAME, which synergizes the expertise of gradient projection and adaptive merging. Extensive experiments show that our approach outperforms state-of-the-art CL methods and existing merging strategies.
