Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

Seokhyeon Ha; Sunbeom Jung; Jungwoo Lee

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

Seokhyeon Ha, Sunbeom Jung, Jungwoo Lee

TL;DR

This paper proposes Domain-Aware Fine-Tuning (DAFT), a novel approach that incorporates batch normalization conversion and the integration of linear probing and fine-tuning that significantly mitigates feature distortion and achieves improved model performance on both in-dist distribution and out-of-distribution datasets.

Abstract

Fine-tuning pre-trained neural network models has become a widely adopted approach across various domains. However, it can lead to the distortion of pre-trained feature extractors that already possess strong generalization capabilities. Mitigating feature distortion during adaptation to new target domains is crucial. Recent studies have shown promising results in handling feature distortion by aligning the head layer on in-distribution datasets before performing fine-tuning. Nonetheless, a significant limitation arises from the treatment of batch normalization layers during fine-tuning, leading to suboptimal performance. In this paper, we propose Domain-Aware Fine-Tuning (DAFT), a novel approach that incorporates batch normalization conversion and the integration of linear probing and fine-tuning. Our batch normalization conversion method effectively mitigates feature distortion by reducing modifications to the neural network during fine-tuning. Additionally, we introduce the integration of linear probing and fine-tuning to optimize the head layer with gradual adaptation of the feature extractor. By leveraging batch normalization layers and integrating linear probing and fine-tuning, our DAFT significantly mitigates feature distortion and achieves improved model performance on both in-distribution and out-of-distribution datasets. Extensive experiments demonstrate that our method outperforms other baseline methods, demonstrating its effectiveness in not only improving performance but also mitigating feature distortion.

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

TL;DR

Abstract

Paper Structure (34 sections, 6 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 34 sections, 6 equations, 9 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Head Initialization for Fine-Tuning
Other Modifications for Fine-Tuning
Batch Normalization in Transfer Learning
Method
Converting Batch Normalization
Batch Normalization (BN)
BN Transfer Issue
Batch Normalization Conversion
Integrating LP and FT
Limitation of LP-FT
Integrated LP-FT
Experiments
Experiments on Classification Task
...and 19 more sections

Figures (9)

Figure 1: Distribution of similarity measures for three different fine-tuning methods. We fine-tune the pre-trained ResNet-50 model from MoCo-v2 chen2020improved on fMoW christie2018functional dataset. First, we extract features from the test data before applying each fine-tuning method. Then, after completing the fine-tuning process, we compute the similarity measures between the pre-fine-tuning and post-fine-tuning features for each method. The similarity measures, including cosine similarity and L2 distance, are computed on the fMoW test dataset. Notably, our DAFT exhibits the least distortion in the pre-trained features across both similarity measures.
Figure 2: Comparison of relative changes in learning parameters (\ref{['fig:conv_w']}, \ref{['fig:bn_w']}, \ref{['fig:bn_b']}) and BN statistics (\ref{['fig:bn_m']}, \ref{['fig:bn_v']}) between FT and our DAFT on the fMoW dataset. We utilize the pre-trained ResNet-50 model from MoCo-v2 as the feature extractor and conduct each fine-tuning method. The ResNet-50 architecture consists of an Input Stem and 4 subsequent stages, with each stage indicated on the x-axis from left to right. 'IS' on the x-axis represents the Input Stem, and all layers within each stage are further indicated on the x-axis sequentially from left to right. The relative changes are computed between the initial values before each fine-tuning and the final values after each fine-tuning process. Note that the relative changes of BN statistics are represented in log scale.
Figure 3: Examples of DomainNet dataset.
Figure 4: Example of Pascal VOC 2012.
Figure 5: Example of cityscapes.
...and 4 more figures

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

TL;DR

Abstract

Domain-Aware Fine-Tuning: Enhancing Neural Network Adaptability

Authors

TL;DR

Abstract

Table of Contents

Figures (9)