Parameter Importance-Driven Continual Learning for Foundation Models
Lingxiang Wang, Hainan Zhang, Zhiming Zheng
TL;DR
The paper tackles catastrophic forgetting during domain-specific post-training of large foundation models. It introduces PIECE, a parameter-importance driven continual enhancement method that updates only a tiny fraction of parameters (0.1%) per task, guided by two estimators: PIECE-F based on Fisher information and PIECE-S based on a second-order normalization that fuses gradient and curvature signals. PIECE operates under a no-history, no-architecture-change assumption and yields state-of-the-art continual learning performance across multiple language and multimodal models while preserving core capabilities such as programming and image captioning. The approach demonstrates robust, scalable domain adaptation with strong transfer and minimal forgetting, highlighting a practical path to sustainable continual learning in large, real-world models.
Abstract
Domain-specific post-training often causes catastrophic forgetting, making foundation models lose their general reasoning ability and limiting their adaptability to dynamic real-world environments. Preserving general capabilities while acquiring downstream domain knowledge is a central challenge for large language and multimodal models. Traditional continual learning methods, such as regularization, replay and architectural isolation, suffer from poor downstream performance, reliance on inaccessible historical data, or additional parameter overhead. While recent parameter-efficient tuning (PET) methods can alleviate forgetting, their effectiveness strongly depends on the choice of parameters and update strategies. In this paper, we introduce PIECE, a Parameter Importance Estimation-based Continual Enhancement method that preserves general ability while efficiently learning domain knowledge without accessing prior training data or increasing model parameters. PIECE selectively updates only 0.1% of core parameters most relevant to new tasks, guided by two importance estimators: PIECE-F based on Fisher Information, and PIECE-S based on a second-order normalization that combines gradient and curvature information. Experiments across three language models and two multimodal models show that PIECE maintains general capabilities and achieves state-of-the-art continual learning performance across diverse downstream tasks. Our results highlight a practical path to scalable, domain-adaptive foundation models without catastrophic forgetting.
