Table of Contents
Fetching ...

Normalizing Batch Normalization for Long-Tailed Recognition

Yuxiang Bao, Guoliang Kang, Linlin Yang, Xiaoyue Duan, Bo Zhao, Baochang Zhang

TL;DR

This work addresses the challenge of long-tailed recognition by revealing that rare-class features can be inherently weaker and biased within standard BN statistics. It introduces Normalizing Batch Normalization (NBN), which decouples the magnitude and direction of BN parameters and normalizes them to balance feature strengths, complemented by logit rectification to reduce classifier bias. Across CIFAR-10/100-LT, ImageNet-LT, and iNaturalist 2018, NBN demonstrates strong, consistent improvements and remains compatible with other long-tailed methods, offering a simple, plug-and-play solution that enhances rare-class performance without sacrificing head accuracy. The approach also shows promise in extending to detection/segmentation on LVIS-V1, indicating broad applicability to long-tailed visual tasks.

Abstract

In real-world scenarios, the number of training samples across classes usually subjects to a long-tailed distribution. The conventionally trained network may achieve unexpected inferior performance on the rare class compared to the frequent class. Most previous works attempt to rectify the network bias from the data-level or from the classifier-level. Differently, in this paper, we identify that the bias towards the frequent class may be encoded into features, i.e., the rare-specific features which play a key role in discriminating the rare class are much weaker than the frequent-specific features. Based on such an observation, we introduce a simple yet effective approach, normalizing the parameters of Batch Normalization (BN) layer to explicitly rectify the feature bias. To achieve this end, we represent the Weight/Bias parameters of a BN layer as a vector, normalize it into a unit one and multiply the unit vector by a scalar learnable parameter. Through decoupling the direction and magnitude of parameters in BN layer to learn, the Weight/Bias exhibits a more balanced distribution and thus the strength of features becomes more even. Extensive experiments on various long-tailed recognition benchmarks (i.e., CIFAR-10/100-LT, ImageNet-LT and iNaturalist 2018) show that our method outperforms previous state-of-the-arts remarkably. The code and checkpoints are available at https://github.com/yuxiangbao/NBN.

Normalizing Batch Normalization for Long-Tailed Recognition

TL;DR

This work addresses the challenge of long-tailed recognition by revealing that rare-class features can be inherently weaker and biased within standard BN statistics. It introduces Normalizing Batch Normalization (NBN), which decouples the magnitude and direction of BN parameters and normalizes them to balance feature strengths, complemented by logit rectification to reduce classifier bias. Across CIFAR-10/100-LT, ImageNet-LT, and iNaturalist 2018, NBN demonstrates strong, consistent improvements and remains compatible with other long-tailed methods, offering a simple, plug-and-play solution that enhances rare-class performance without sacrificing head accuracy. The approach also shows promise in extending to detection/segmentation on LVIS-V1, indicating broad applicability to long-tailed visual tasks.

Abstract

In real-world scenarios, the number of training samples across classes usually subjects to a long-tailed distribution. The conventionally trained network may achieve unexpected inferior performance on the rare class compared to the frequent class. Most previous works attempt to rectify the network bias from the data-level or from the classifier-level. Differently, in this paper, we identify that the bias towards the frequent class may be encoded into features, i.e., the rare-specific features which play a key role in discriminating the rare class are much weaker than the frequent-specific features. Based on such an observation, we introduce a simple yet effective approach, normalizing the parameters of Batch Normalization (BN) layer to explicitly rectify the feature bias. To achieve this end, we represent the Weight/Bias parameters of a BN layer as a vector, normalize it into a unit one and multiply the unit vector by a scalar learnable parameter. Through decoupling the direction and magnitude of parameters in BN layer to learn, the Weight/Bias exhibits a more balanced distribution and thus the strength of features becomes more even. Extensive experiments on various long-tailed recognition benchmarks (i.e., CIFAR-10/100-LT, ImageNet-LT and iNaturalist 2018) show that our method outperforms previous state-of-the-arts remarkably. The code and checkpoints are available at https://github.com/yuxiangbao/NBN.
Paper Structure (15 sections, 6 equations, 6 figures, 17 tables)

This paper contains 15 sections, 6 equations, 6 figures, 17 tables.

Figures (6)

  • Figure 1: Illustration of feature bias with a rare class sample. We visualize the attention maps zagoruyko2016paying with respect to feature channels which are uniquely important for scoring "Fountain Pen" (the second column of attention maps) and "Eraser" (the first column of attention maps) respectively. Besides, we visualize the attention maps of all feature channels. The statistics of feature channels, including mean $\mu$, and standard deviation $\sigma$, are calculated with samples in the balanced test set to represent the strength of features. For baseline, though the rare-specific features emerge, their weak strength makes the overall attention unexpectedly focus on irrelevant regions. In contrast, our method strengthens those rare-specific features, enabling the model to focus on class-discriminative regions. See more details in our supplementary materials.
  • Figure 2: Visualization of the magnitude $g_{\boldsymbol{\gamma}}$ during the training process. The results come from experiments on ImageNet-LT. The trends are similar on other datasets.
  • Figure 3: Illustration of positions to insert the Normalizing Batch Normalization (NBN) layer in the ResNet architecture. We only insert NBN to the last stage of ResNet architecture which consists of three sequential residual blocks.
  • Figure 4: Visualization of the rare-specific features channels and combination of all the feature channels with the vanilla ResNet-50 trained with cross-entropy loss (Cross Entropy) and Balanced Softmax (Balanced Softmax), and ResNet-50 equipped with NBN trained with cross-entropy loss (Ours). We observe that with the aid of NBN, the model keeps focusing on the class-discriminative regions.
  • Figure 5: Comparison of the weight of the last BN layer among the ResNet-50 trained on ImageNet, on ImageNet-LT, and ResNet-50 with NBN on ImageNet-LT.
  • ...and 1 more figures