Read Between the Layers: Leveraging Multi-Layer Representations for Rehearsal-Free Continual Learning with Pre-Trained Models
Kyra Ahrens, Hans Hergen Lehmann, Jae Hee Lee, Stefan Wermter
TL;DR
This work tackles rehearsal-free continual learning with pre-trained models by leveraging intermediate representations from multiple layers. It introduces LayUP, a simple yet effective class-prototype method that constructs prototypes from concatenated multi-layer features and decorrelates them via second-order Gram statistics, paired with first-session adaptation through PETL. The approach enables robust performance across CIL, DIL, and OCL benchmarks while reducing memory and compute costs compared to existing baselines, and it also serves as a versatile plug-in to enhance other prototype methods. The results demonstrate that fully exploiting pre-trained representations across layers can significantly improve domain transfer and continual learning under limited data scenarios.
Abstract
We address the Continual Learning (CL) problem, wherein a model must learn a sequence of tasks from non-stationary distributions while preserving prior knowledge upon encountering new experiences. With the advancement of foundation models, CL research has pivoted from the initial learning-from-scratch paradigm towards utilizing generic features from large-scale pre-training. However, existing approaches to CL with pre-trained models primarily focus on separating class-specific features from the final representation layer and neglect the potential of intermediate representations to capture low- and mid-level features, which are more invariant to domain shifts. In this work, we propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our method is conceptually simple, does not require access to prior data, and works out of the box with any foundation model. LayUP surpasses the state of the art in four of the seven class-incremental learning benchmarks, all three domain-incremental learning benchmarks and in six of the seven online continual learning benchmarks, while significantly reducing memory and computational requirements compared to existing baselines. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
