Table of Contents
Fetching ...

Revisiting Batch Normalization For Practical Domain Adaptation

Yanghao Li, Naiyan Wang, Jianping Shi, Jiaying Liu, Xiaodi Hou

TL;DR

This paper tackles domain shift in visual recognition by reusing Batch Normalization statistics for domain adaptation. It introduces Adaptive Batch Normalization (AdaBN), a parameter-free method that replaces BN statistics per domain to align source and target representations. AdaBN achieves state-of-the-art performance on Office-31 and Caltech-Bing, including multi-source settings, and is complementary with methods like CORAL. It demonstrates practical impact with large improvements in remote-sensing cloud detection and shows robustness to limited target-domain data.

Abstract

Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to prepare at least thousands of labeled images to fine-tune a network to a specific domain. Recent study (Tommasi et al. 2015) shows that a DNN has strong dependency towards the training dataset, and the learned features cannot be easily transferred to a different but relevant task without fine-tuning. In this paper, we propose a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN. By modulating the statistics in all Batch Normalization layers across the network, our approach achieves deep adaptation effect for domain adaptation tasks. In contrary to other deep learning domain adaptation methods, our method does not require additional components, and is parameter-free. It archives state-of-the-art performance despite its surprising simplicity. Furthermore, we demonstrate that our method is complementary with other existing methods. Combining AdaBN with existing domain adaptation treatments may further improve model performance.

Revisiting Batch Normalization For Practical Domain Adaptation

TL;DR

This paper tackles domain shift in visual recognition by reusing Batch Normalization statistics for domain adaptation. It introduces Adaptive Batch Normalization (AdaBN), a parameter-free method that replaces BN statistics per domain to align source and target representations. AdaBN achieves state-of-the-art performance on Office-31 and Caltech-Bing, including multi-source settings, and is complementary with methods like CORAL. It demonstrates practical impact with large improvements in remote-sensing cloud detection and shows robustness to limited target-domain data.

Abstract

Deep neural networks (DNN) have shown unprecedented success in various computer vision applications such as image classification and object detection. However, it is still a common annoyance during the training phase, that one has to prepare at least thousands of labeled images to fine-tune a network to a specific domain. Recent study (Tommasi et al. 2015) shows that a DNN has strong dependency towards the training dataset, and the learned features cannot be easily transferred to a different but relevant task without fine-tuning. In this paper, we propose a simple yet powerful remedy, called Adaptive Batch Normalization (AdaBN) to increase the generalization ability of a DNN. By modulating the statistics in all Batch Normalization layers across the network, our approach achieves deep adaptation effect for domain adaptation tasks. In contrary to other deep learning domain adaptation methods, our method does not require additional components, and is parameter-free. It archives state-of-the-art performance despite its surprising simplicity. Furthermore, we demonstrate that our method is complementary with other existing methods. Combining AdaBN with existing domain adaptation treatments may further improve model performance.

Paper Structure

This paper contains 17 sections, 3 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of the proposed method. For each convolutional or fully connected layer, we use different bias/variance terms to perform batch normalization for the training domain and the test domain. The domain specific normalization mitigates the domain shift issue.
  • Figure 2: t-SNE tsne visualization of the mini-batch BN feature vector distributions in both shallow and deep layers, across different datasets. Each point represents the BN statistics in one mini-batch. Red dots come from Bing domain, while the blue ones are from Caltech-256 domain. The size of each mini-batch is 64.
  • Figure 3: Distribution of the symmetric KL divergence of the outputs in shallow layer and deep layer. Best viewed in color.
  • Figure 4: Accuracy when varying the number of mini-batches used for calculating the statistics of BN layers in A$\rightarrow$W and B$\rightarrow$C, respectively. For B$\rightarrow$C, we only show the results of using less than 100 batches, since the results are very stable when adding more examples. The batch size is 64 in this experiment.
  • Figure 5: Remote sensing images in different domains.
  • ...and 1 more figures