Table of Contents
Fetching ...

Context Normalization Layer with Applications

Bilal Faye, Mohamed-Djallel Dilmi, Hanane Azzag, Mustapha Lebbah, Djamel Bouchaffra

TL;DR

The paper introduces Context Normalization (CN), a context-aware normalization method for image data that treats data as arising from a mixture of contexts and learns per-context statistics $\mu_r$ and $\sigma_r$ via a context embedder. CN extends Mixture Normalization by avoiding EM, enabling end-to-end differentiable learning, with CN-Patches and CN-Channels variants and an enhanced CN+ inference strategy that aggregates across contexts using posterior weights $\tau_r(x)$. Empirical results on CIFAR-10/100, ViT-based architectures, and domain adaptation scenarios (e.g., AdaMatch) show that CN accelerates convergence and yields superior accuracy over Batch Normalization and Mixture Normalization, often by large margins and across different learning rates and data distributions. The work demonstrates CN’s versatility across supervised, self-supervised, and domain-adaptive settings, highlighting its practical impact for robust, context-aware normalization in modern computer vision pipelines.

Abstract

Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.

Context Normalization Layer with Applications

TL;DR

The paper introduces Context Normalization (CN), a context-aware normalization method for image data that treats data as arising from a mixture of contexts and learns per-context statistics and via a context embedder. CN extends Mixture Normalization by avoiding EM, enabling end-to-end differentiable learning, with CN-Patches and CN-Channels variants and an enhanced CN+ inference strategy that aggregates across contexts using posterior weights . Empirical results on CIFAR-10/100, ViT-based architectures, and domain adaptation scenarios (e.g., AdaMatch) show that CN accelerates convergence and yields superior accuracy over Batch Normalization and Mixture Normalization, often by large margins and across different learning rates and data distributions. The work demonstrates CN’s versatility across supervised, self-supervised, and domain-adaptive settings, highlighting its practical impact for robust, context-aware normalization in modern computer vision pipelines.

Abstract

Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.
Paper Structure (15 sections, 19 equations, 6 figures, 10 tables, 3 algorithms)

This paper contains 15 sections, 19 equations, 6 figures, 10 tables, 3 algorithms.

Figures (6)

  • Figure 1: A concise overview of the processing steps involved in Batch Normalization (BN), Mixture Normalization (MN), and Context Normalization (CN). The dashed line in the Batch Normalization diagram indicates a mini-batch parameter update, highlighting a key step in the process.
  • Figure 2: Context Normalization Layer applied to a given activation $x_i$. The context identifier ($r$) is encoded by a neural network, the output of which is then used as input to two different neural networks to generate a mean ($\mu_r$) and a standard deviation ($\sigma_r$), respectively, for normalizing $x_i$.
  • Figure 3: Comparative analysis of validation error curves for the CIFAR ConvNet architecture (ref. Table \ref{['table:cnn_mn']}) trained under different learning rate and weight decay configurations.
  • Figure 4: Validation loss curves on CIFAR-100 when the ViT architecture is trained with different normalization methods.
  • Figure 5: Simulation of a night image on a day image: night image is obtained by normalizing day image with Day parameters, then scale and shift with Night parameters.
  • ...and 1 more figures