Supervised Batch Normalization
Bilal Faye, Mustapha Lebbah, Hanane Azzag
TL;DR
The paper addresses BN's sensitivity to heterogeneous data by introducing Supervised Batch Normalization (SBN), which defines contexts (domains, modalities, or clusters) before training and normalizes samples within each context using context-specific statistics. By avoiding online estimation of multiple modes and leveraging predefined groupings, SBN achieves substantial gains across tasks, including a 15.13% accuracy improvement on CIFAR-100 with Vision Transformers and a 22.25% improvement in domain adaptation (MNIST→SVHN) with AdaMatch. The approach provides a practical, cost-efficient way to incorporate multiple normalization modes, showing strong performance in both multi-task and single-task settings while maintaining implementation simplicity. These results highlight SBN's potential for improved stability, convergence, and generalization in diverse real-world datasets and encourage exploration in multimodal contexts.
Abstract
Batch Normalization (BN), a widely-used technique in neural networks, enhances generalization and expedites training by normalizing each mini-batch to the same mean and variance. However, its effectiveness diminishes when confronted with diverse data distributions. To address this challenge, we propose Supervised Batch Normalization (SBN), a pioneering approach. We expand normalization beyond traditional single mean and variance parameters, enabling the identification of data modes prior to training. This ensures effective normalization for samples sharing common features. We define contexts as modes, categorizing data with similar characteristics. These contexts are explicitly defined, such as domains in domain adaptation or modalities in multimodal systems, or implicitly defined through clustering algorithms based on data similarity. We illustrate the superiority of our approach over BN and other commonly employed normalization techniques through various experiments on both single and multi-task datasets. Integrating SBN with Vision Transformer results in a remarkable \textit{15.13}\% accuracy enhancement on CIFAR-100. Additionally, in domain adaptation scenarios, employing AdaMatch demonstrates an impressive \textit{22.25}\% accuracy improvement on MNIST and SVHN compared to BN.
