Fine-Tuning is Fine, if Calibrated

Zheda Mai; Arpita Chowdhury; Ping Zhang; Cheng-Hao Tu; Hong-You Chen; Vardaan Pahuja; Tanya Berger-Wolf; Song Gao; Charles Stewart; Yu Su; Wei-Lun Chao

Fine-Tuning is Fine, if Calibrated

Zheda Mai, Arpita Chowdhury, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun Chao

TL;DR

It is found that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes, and instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning.

Abstract

Fine-tuning is arguably the most straightforward way to tailor a pre-trained model (e.g., a foundation model) to downstream applications, but it also comes with the risk of losing valuable knowledge the model had learned in pre-training. For example, fine-tuning a pre-trained classifier capable of recognizing a large number of classes to master a subset of classes at hand is shown to drastically degrade the model's accuracy in the other classes it had previously learned. As such, it is hard to further use the fine-tuned model when it encounters classes beyond the fine-tuning data. In this paper, we systematically dissect the issue, aiming to answer the fundamental question, "What has been damaged in the fine-tuned model?" To our surprise, we find that the fine-tuned model neither forgets the relationship among the other classes nor degrades the features to recognize these classes. Instead, the fine-tuned model often produces more discriminative features for these other classes, even if they were missing during fine-tuning! {What really hurts the accuracy is the discrepant logit scales between the fine-tuning classes and the other classes}, implying that a simple post-processing calibration would bring back the pre-trained model's capability and at the same time unveil the feature improvement over all classes. We conduct an extensive empirical study to demonstrate the robustness of our findings and provide preliminary explanations underlying them, suggesting new directions for future theoretical analysis. Our code is available at https://github.com/OSU-MLB/Fine-Tuning-Is-Fine-If-Calibrated.

Fine-Tuning is Fine, if Calibrated

TL;DR

Abstract

Paper Structure (35 sections, 8 equations, 28 figures, 14 tables)

This paper contains 35 sections, 8 equations, 28 figures, 14 tables.

Introduction
Related Work
Background
A Systematic Study of Fine-Tuning (FT)
Experiment setup: datasets, models, and evaluation metrics
Is the fine-tuned feature extractor damaged?
What is damaged in the fine-tuned neural network classifier?
Post-Processing Calibration for the Rescue
Ablation Study and Additional Analysis
Conclusion
Experiment and Dataset Details
Main Investigation (cf. section 3.1 in the main paper)
Dataset Details
ImageNet-Variants
Office-Home venkateswara2017deepoffice
...and 20 more sections

Figures (28)

Figure 1: An illustration of fine-tuning (FT) a pre-trained model. Typically, FT is performed with the available downstream data at hand, yet in deployment, the model may encounter some other classes, for example, the ones it had been pre-trained upon. Ideally, the FT model should recognize all classes well.
Figure 2: Fine-tuning ($\star$) $+$post-processing calibration with a bias factor $\gamma$ () can outperform the SOTA solution ($\star$) tu2023holistic.
Figure 3: Accuracy gain after fine-tuning. We consider the neural network (NN) classifier with the FC layer (\ref{['sec:background']}) and the NCM classifier using only features (\ref{['eq_NCM']}). We show the average accuracy gain on fine-tuning classes (Acc$_{\mathcal{S}/\mathcal{Y}}$) and absent classes (Acc$_{\mathcal{U}/\mathcal{Y}}$). While the NN classifier suffers drops in Acc$_{\mathcal{U}/\mathcal{Y}}$, the NCM classifier enjoys a consistent gain, suggesting the holistic improvement of features after fine-tuning.
Figure 4: Acc$_{\mathcal{U}/\mathcal{U}}$ along with the fine-tuning epochs.
Figure 5: Average predicted probability that absent class examples belong to absent classes.
...and 23 more figures

Fine-Tuning is Fine, if Calibrated

TL;DR

Abstract

Fine-Tuning is Fine, if Calibrated

Authors

TL;DR

Abstract

Table of Contents

Figures (28)