Vision Transformer Finetuning Benefits from Non-Smooth Components
Ambroise Odonnat, Laetitia Chapel, Romain Tavenard, Ievgen Redko
TL;DR
This study reframes finetuning of vision transformers through a plasticity lens, defining the average rate of input-output change $\mathscr{P}(f)$ to quantify component adaptability. The authors derive upper bounds and a theoretical ranking that places the multi-head attention module as the most plastic, followed by the first and second feedforward layers, with LayerNorms the least plastic. Empirical results on ViT-Base and ViT-Huge across 11 classification benchmarks show attention and early FFN layers yield the best and most robust finetuning performance, validating the theory and guiding selective finetuning strategies. The findings offer a novel perspective on the role of smoothness in transformer adaptation and provide practical guidance for parameter-efficient finetuning and future adaptation methods.
Abstract
The smoothness of the transformer architecture has been extensively studied in the context of generalization, training stability, and adversarial robustness. However, its role in transfer learning remains poorly understood. In this paper, we analyze the ability of vision transformer components to adapt their outputs to changes in inputs, or, in other words, their plasticity. Defined as an average rate of change, it captures the sensitivity to input perturbation; in particular, a high plasticity implies low smoothness. We demonstrate through theoretical analysis and comprehensive experiments that this perspective provides principled guidance in choosing the components to prioritize during adaptation. A key takeaway for practitioners is that the high plasticity of the attention modules and feedforward layers consistently leads to better finetuning performance. Our findings depart from the prevailing assumption that smoothness is desirable, offering a novel perspective on the functional properties of transformers. The code is available at https://github.com/ambroiseodt/vit-plasticity.
