Table of Contents
Fetching ...

Representation Finetuning for Continual Learning

Haihua Luo, Xuming Ran, Tommi Kärkkäinen, Huiyan Xue, Zhonghua Chen, Qi Xu, Fengyu Cong

Abstract

The world is inherently dynamic, and continual learning aims to enable models to adapt to ever-evolving data streams. While pre-trained models have shown powerful performance in continual learning, they still require finetuning to adapt effectively to downstream tasks. However, prevailing Parameter-Efficient Fine-Tuning (PEFT) methods operate through empirical, black-box optimization at the weight level. These approaches lack explicit control over representation drift, leading to sensitivity to domain shifts and catastrophic forgetting in continual learning scenarios. In this work, we introduce Continual Representation Learning (CoRe), a novel framework that for the first time shifts the finetuning paradigm from weight space to representation space. Unlike conventional methods, CoRe performs task-specific interventions within a low-rank linear subspace of hidden representations, adopting a learning process with explicit objectives, which ensures stability for past tasks while maintaining plasticity for new ones. By constraining updates to a low-rank subspace, CoRe achieves exceptional parameter efficiency. Extensive experiments across multiple continual learning benchmarks demonstrate that CoRe not only preserves parameter efficiency but also significantly outperforms existing state-of-the-art methods. Our work introduces representation finetuning as a new, more effective and interpretable paradigm for continual learning.

Representation Finetuning for Continual Learning

Abstract

The world is inherently dynamic, and continual learning aims to enable models to adapt to ever-evolving data streams. While pre-trained models have shown powerful performance in continual learning, they still require finetuning to adapt effectively to downstream tasks. However, prevailing Parameter-Efficient Fine-Tuning (PEFT) methods operate through empirical, black-box optimization at the weight level. These approaches lack explicit control over representation drift, leading to sensitivity to domain shifts and catastrophic forgetting in continual learning scenarios. In this work, we introduce Continual Representation Learning (CoRe), a novel framework that for the first time shifts the finetuning paradigm from weight space to representation space. Unlike conventional methods, CoRe performs task-specific interventions within a low-rank linear subspace of hidden representations, adopting a learning process with explicit objectives, which ensures stability for past tasks while maintaining plasticity for new ones. By constraining updates to a low-rank subspace, CoRe achieves exceptional parameter efficiency. Extensive experiments across multiple continual learning benchmarks demonstrate that CoRe not only preserves parameter efficiency but also significantly outperforms existing state-of-the-art methods. Our work introduces representation finetuning as a new, more effective and interpretable paradigm for continual learning.
Paper Structure (15 sections, 2 theorems, 9 equations, 4 figures, 6 tables)

This paper contains 15 sections, 2 theorems, 9 equations, 4 figures, 6 tables.

Key Result

Theorem 1

Let $\mathcal{L}(\theta)$ be the loss function for a new task, and let $\Delta\bm{e} = R^{\top}(W\bm{e}+b-R\bm{e})$ be the representation intervention defined in the visual ReFT formulation. The intervention constrained to a low-rank subspace $R$ satisfies: where $\sigma_{\max}(R^{\top})$ denotes the maximum singular value of $R^{\top}$. This result shows that the magnitude of representation chan

Figures (4)

  • Figure 1: The overall structure of the proposed method. (a) illustrates a standard ViT block and (b) depicts the implementation of ReFT. Unlike previous finetuning approaches, ReFT directly intervenes in the model by modifying its intermediate features. Specifically, the features are projected into a low-rank subspace via learnable parameters R, w, and b, and then mapped back to the original dimension. Gray areas indicate frozen parameters, and green areas denote trainable components.
  • Figure 2: Comparison of trainable parameters and Avg accuracy of finetuning methods, the experiments are conducted on CIFAR Inc10 with ViT-B/16-IN21K.
  • Figure 3: Average value of Last accuracy of methods under different random seeds. All the results are conducted on CIFAR Inc10 with ViT-B/16-IN21K.
  • Figure 4: Last accuracy of various finetuning methods under Class Incremental Learning scenario using ViT-B/16-IN1K as the backbone. Each task contains the same number of classes across all datasets. Core consistently outperforms other finetuning approaches even with the ViT-B/16-IN1K architecture.

Theorems & Definitions (2)

  • Theorem 1: Stability of Low-Rank Subspace Intervention
  • Proposition 1: Explicit Optimization Objective