Towards Understanding Regularization in Batch Normalization

Ping Luo; Xinjiang Wang; Wenqi Shao; Zhanglin Peng

Towards Understanding Regularization in Batch Normalization

Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

TL;DR

This work provides a theoretical framework for understanding Batch Normalization by modeling BN as an explicit regularizer comprising population normalization (PN) and gamma decay, derived from priors on batch statistics. Using a single-layer perceptron as a building block and a teacher-student dynamic, the authors derive ODEs to describe learning dynamics, show BN enables larger maximum and effective learning rates, and analyze generalization via a statistical-mechanics approach. They also validate the theory with CNN experiments, showing BN-like regularization traits match PN+gamma decay under appropriate conditions and that regularization can be maintained or enhanced with dropout at large batch sizes. The results unify optimization and generalization insights for BN and provide directions for extending the analysis to deeper networks and other normalizers.

Abstract

Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.

Towards Understanding Regularization in Batch Normalization

TL;DR

Abstract

Towards Understanding Regularization in Batch Normalization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)