On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm
Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan
TL;DR
The paper identifies Cross-Task Linearity (CTL), a cross-task extension of Layerwise Linear Feature Connectivity (LLFC), as a prevalent phenomenon when finetuning from a shared pretrained checkpoint on different tasks. It shows that linear interpolations of finetuned weights yield features that closely follow the linear interpolation of their layerwise features, suggesting an approximate linear map from parameter space to feature space. CTL is leveraged to explain model merging/editing techniques such as model averaging and task arithmetic, by translating parameter-space operations into the feature space. The authors also explore root causes, linking CTL to pretraining depth and task similarity, and provide a preliminary theoretical bound related to loss flatness and weight distance, with notes on future work for theory and large-language-model settings.
Abstract
The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, we show that if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are often approximately equal to the linear interpolation of features in two finetuned models at each layer. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks approximately function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the root cause for the emergence of CTL, highlighting the role of pretraining.
