Table of Contents
Fetching ...

Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

Elif Ceren Gok Yildirim, Murat Onur Yildirim, Joaquin Vanschoren

Abstract

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substantial methodological gap, making it difficult to assess whether recent FM-based CIL progress reflects genuine advances or merely the absence of rigorous baselines. To address this gap, we introduce Pruned Adaptation Modules (PAM), a simple yet effective method that freezes the vast majority of the pre-trained ResNet while enabling scalable continual adaptation through sparse task-specific layers. PAM yields up to a ~5x reduction in trainable parameters and a ~6x reduction in total parameters, significantly reducing the cost of continual updates. Across diverse benchmarks, PAM consistently mitigates catastrophic forgetting and outperforms state-of-the-art FM-based CIL approaches. Our findings position PAM as a strong and transparent baseline that helps bridge the gap between traditional and FM-based CIL, guiding future research for a more accurate assessment of true progress in continual adaptation. The code can be found at: https://github.com/ElifCerenGokYildirim/PAM.

Pruned Adaptation Modules: A Simple yet Strong Baseline for Continual Foundation Models

Abstract

The continual learning literature has rapidly shifted from traditional class incremental learning (CIL) techniques to foundation model (FM)-based CIL methods without a clear understanding of how these newer approaches compare to strong, lightweight convolutional baselines. This abrupt transition has created a substantial methodological gap, making it difficult to assess whether recent FM-based CIL progress reflects genuine advances or merely the absence of rigorous baselines. To address this gap, we introduce Pruned Adaptation Modules (PAM), a simple yet effective method that freezes the vast majority of the pre-trained ResNet while enabling scalable continual adaptation through sparse task-specific layers. PAM yields up to a ~5x reduction in trainable parameters and a ~6x reduction in total parameters, significantly reducing the cost of continual updates. Across diverse benchmarks, PAM consistently mitigates catastrophic forgetting and outperforms state-of-the-art FM-based CIL approaches. Our findings position PAM as a strong and transparent baseline that helps bridge the gap between traditional and FM-based CIL, guiding future research for a more accurate assessment of true progress in continual adaptation. The code can be found at: https://github.com/ElifCerenGokYildirim/PAM.
Paper Structure (32 sections, 9 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 9 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: PAM is a simple yet powerful bridge that challenges the progress in FM–based CIL. It achieves better accuracy with ResNets, which significantly reduces runtime and parameters.
  • Figure 2: PAM freezes the first three layers of a pre-trained ResNet to preserve general knowledge while dynamically adding a task-specific last layer for each new task. To improve parameter efficiency, each last layer is structurally pruned to become ‘slim’ before training on its corresponding task. After training, the weights are frozen to prevent forgetting.
  • Figure 3: Parameter size vs. accuracy: The left panel shows that PAM challenges existing and future FM-based methods; and the right panel presents the parameter count for different methods after completing all sessions.
  • Figure 4: Ablations of different components for the PAM method. (a) Effect of pruning timing on the performance. (b) Impact of different sparsity levels on performance. (c) Comparison of task-specific adaptation module selection strategies during inference.
  • Figure 5: Analysis of PAM across different evaluation settings. (a) Final incremental accuracy after all tasks, (b) per-task training accuracy illustrating the effect of initialization strategies for pruned adaptation modules, and (c) performance of weighted ensemble strategies compared to the single best module. Ratios show the relative contribution of the most confident module and the remaining modules (e.g., 0.9/0.1).
  • ...and 4 more figures