Table of Contents
Fetching ...

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, Qing Qu

TL;DR

This work investigates how Neural Collapse, a geometric regularity in last-layer features and classifiers, correlates with transfer learning performance. By adapting NC metrics to downstream data, the authors reveal that greater feature collapse on downstream tasks often predicts higher transfer accuracy, and they show a contrary, more nuanced relationship for source data. They introduce a principled, parameter-efficient fine-tuning approach, Skip Connection Layer Fine-Tuning, that achieves strong performance with a fraction of tunable parameters and improved robustness in data-scarce regimes. The findings offer practical guidelines for layer selection and fine-tuning in large pretrained models and highlight both the potential and limits of using NC as a proxy for transferability, pointing to future theoretical connections between NC and transfer dynamics.

Abstract

With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

TL;DR

This work investigates how Neural Collapse, a geometric regularity in last-layer features and classifiers, correlates with transfer learning performance. By adapting NC metrics to downstream data, the authors reveal that greater feature collapse on downstream tasks often predicts higher transfer accuracy, and they show a contrary, more nuanced relationship for source data. They introduce a principled, parameter-efficient fine-tuning approach, Skip Connection Layer Fine-Tuning, that achieves strong performance with a fraction of tunable parameters and improved robustness in data-scarce regimes. The findings offer practical guidelines for layer selection and fine-tuning in large pretrained models and highlight both the potential and limits of using NC as a proxy for transferability, pointing to future theoretical connections between NC and transfer dynamics.

Abstract

With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and source data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream training data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the source data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90% and mitigating overfitting in situations especially when the downstream data is scarce.
Paper Structure (42 sections, 8 equations, 16 figures, 6 tables)

This paper contains 42 sections, 8 equations, 16 figures, 6 tables.

Figures (16)

  • Figure 1: Transfer accuracy and $\mathcal{NC}_1$ of Cifar-100 pre-trained models on different downstream tasks. We pre-train ResNet50 models on Cifar-100 using different levels of data augmentation or adversarial training. Here, $\mathcal{NC}_1$ is measured on the downstream Cifar-10 dataset.
  • Figure 2: Transfer accuracy and $\mathcal{NC}_1$ of public ImageNet-1k pre-trained models on different downstream tasks. We evaluate transfer accuracy and $\mathcal{NC}_1$ on multiple downstream datasets using various ImageNet-1k pre-trained models, such as ResNet he2016deep, DenseNet huang2017densely and MobileNetV2 sandler2018mobilenetv2. The $\mathcal{NC}_1$ is measured on the corresponding downstream dataset.
  • Figure 3: $\mathcal{NC}_1$ and transfer learning accuracy of different layers from a pre-trained model (Left) and nearly linear relationship between transfer learning accuracy and $\mathcal{NC}_1$ (Right). We use (a) an ImageNet-1k dataset pre-trained ResNet34 model and (b) a released pre-trained ViT-B model. We use the Cifar-10 dataset for transfer learning and measuring the corresponding $\mathcal{NC}_1$.
  • Figure 4: Negative correlation between transfer accuracy and $\mathcal{NC}_1$ of downstream datasets hold on the CLIP model while downstream training accuracy doesn't have a strong correlation with transfer accuracy. We use the image encoder of CLIP model as a feature extractor to extract training and testing features from multiple downstream datasets. We then train linear classifiers and evaluate $\mathcal{NC}_1$ on training features and evaluate transfer accuracy using the testing features.
  • Figure 5: An illustration of layer-wise transfer learning. We use a pre-trained model up to the intermediate $i$-th layer as a feature extractor for transfer learning on the downstream tasks.
  • ...and 11 more figures