Exploring Representation Invariance in Finetuning
Wenqiang Zu, Shenghao Xie, Hao Chen, Zhiqiang Chen, Liwen Hu, Yuanhao Xi, Yiming Liang, Junliang Ye, Bo Lei, Tiejun Huang, Guoqi Li, Lei Ma
TL;DR
This paper tackles the problem that finetuning foundation models on cross-domain, low-resource tasks can erode pretrained representations and generalization. It introduces Representation Invariance FineTuning (RIFT), a regularization that enforces orthogonal invariance between pretrained and finetuned representations by matching covariances of final-layer embeddings through a learnable orthogonal transform, while avoiding expensive pairwise similarity computations. RIFT is shown to be compatible with common finetuning approaches, improving representation similarity (CKA) and often preserving or enhancing downstream performance across medical image datasets and different backbones, including large Vision Transformers. The results demonstrate that adaptation and generalization can be jointly maintained, enabling more robust cross-domain transfer and suggesting a new direction for finetuning paradigms that prioritize both effective learning and preservation of pretrained semantic structure.
Abstract
Foundation models pretrained on large-scale natural images are widely adapted to various cross-domain low-resource downstream tasks, benefiting from generalizable and transferable patterns captured by their representations. However, these representations are later found to gradually vanish during finetuning, accompanied by a degradation of model's original generalizability. In this paper, we argue that such tasks can be effectively adapted without sacrificing the benefits of pretrained representations. We approach this by introducing \textit{Representation Invariance FineTuning (RIFT)}, a regularization that maximizes the representation similarity between pretrained and finetuned models by leveraging orthogonal invariance of manifolds in a computationally efficient way. Experiments demonstrate that our method is compatible with mainstream finetuning methods, offering competitive or even enhanced performance and better preservation of the generalizability.
