Table of Contents
Fetching ...

AFN: Adaptive Fusion Normalization via an Encoder-Decoder Framework

Zikai Zhou, Shuo Zhang, Ziruo Wang, Huanran Chen

TL;DR

The paper addresses the stability-generalisation trade-off in normalization layers by proposing Adaptive Fusion Normalisation (AFN), a BN-based extension that learns both standardisation and rescaling statistics via an encoder–decoder, with residual gates to preserve BN behavior early in training. AFN blends batch statistics with network-derived statistics, aiming to reduce gradient instability seen with ASRNorm while improving domain generalisation across tasks such as Digits, CIFAR-10-C, and PACS, as well as image classification on SVHN, MNIST-M, and CIFAR-10/100. Empirical results show AFN consistently outperforms ASRNorm and BN baselines in several settings, with improved training stability and fewer gradient explosions. The work suggests AFN as a practical normalization strategy for cross-domain vision tasks and hints at broader applicability to other modalities like speech recognition, given its encoder–decoder framework for statistics learning.

Abstract

The success of deep learning is inseparable from normalization layers. Researchers have proposed various normalization functions, and each of them has both advantages and disadvantages. In response, efforts have been made to design a unified normalization function that combines all normalization procedures and mitigates their weaknesses. We also proposed a new normalization function called Adaptive Fusion Normalization. Through experiments, we demonstrate AFN outperforms the previous normalization techniques in domain generalization and image classification tasks.

AFN: Adaptive Fusion Normalization via an Encoder-Decoder Framework

TL;DR

The paper addresses the stability-generalisation trade-off in normalization layers by proposing Adaptive Fusion Normalisation (AFN), a BN-based extension that learns both standardisation and rescaling statistics via an encoder–decoder, with residual gates to preserve BN behavior early in training. AFN blends batch statistics with network-derived statistics, aiming to reduce gradient instability seen with ASRNorm while improving domain generalisation across tasks such as Digits, CIFAR-10-C, and PACS, as well as image classification on SVHN, MNIST-M, and CIFAR-10/100. Empirical results show AFN consistently outperforms ASRNorm and BN baselines in several settings, with improved training stability and fewer gradient explosions. The work suggests AFN as a practical normalization strategy for cross-domain vision tasks and hints at broader applicability to other modalities like speech recognition, given its encoder–decoder framework for statistics learning.

Abstract

The success of deep learning is inseparable from normalization layers. Researchers have proposed various normalization functions, and each of them has both advantages and disadvantages. In response, efforts have been made to design a unified normalization function that combines all normalization procedures and mitigates their weaknesses. We also proposed a new normalization function called Adaptive Fusion Normalization. Through experiments, we demonstrate AFN outperforms the previous normalization techniques in domain generalization and image classification tasks.
Paper Structure (11 sections, 6 equations, 4 figures, 9 tables)

This paper contains 11 sections, 6 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Accuracy of three different normalisation methods for single domain generalisation on CIFAR-10-C, compared at five different levels of domain discrepancy brought by corruptions.
  • Figure 2: Overview of our method, which is divided into two steps: standardisation and rescaling. We use an encoder-decoder framework to learn both the standardisation and rescaling statistics from the mini-batch statistics of the input. For the standardisation stage, we use a residual learning framework to render our training process stable.
  • Figure 3: Input flow in VGG19_BN on SVHN dataset. VGG19_BN has 5 blocks, totally 16 normalisation layer. We choose 1st and 16th normlisation layers to show the gradient explosion/vanishing in AFN/ASRNorm.
  • Figure 4: Illustration of domain generalisation with the PACS benchmark. Single-domain generalisation aims at training a model on one source domain data, while generalising well to other domains with very different visual presentations. For multi-domain generalisation, one source domain data is used for training, and the others are used for testing.