Normalized Convolutional Neural Network
Dongsuk Kim, Geonhee Lee, Myungjae Lee, Shin Uk Kang, Dongmin Kim
TL;DR
This work introduces Normalized Convolution (NC), a convolution-centric normalization that standardizes the rows of the im2col matrix during the convolution operation, enabling adaptive handling of sliced inputs and kernel structures. The authors provide theoretical arguments that NC reduces Lipschitz constants of the loss and gradients, yielding a smoother optimization landscape, and substantiate these claims with extensive experiments across ImageNet, CIFAR, COCO, PASCAL VOC, and generative modeling tasks. Empirically, NC consistently outperforms Group Normalization in micro-batch settings and provides gains when combined with BN or GN, with notable improvements in stability and training speed. The work also demonstrates a CUDA-accelerated implementation to address computational efficiency and discusses relationships with Positional Normalization, suggesting NC as a general, universal normalization mechanism for convolutional networks with potential broad impact on practice.
Abstract
We introduce a Normalized Convolutional Neural Layer, a novel approach to normalization in convolutional networks. Unlike conventional methods, this layer normalizes the rows of the im2col matrix during convolution, making it inherently adaptive to sliced inputs and better aligned with kernel structures. This distinctive approach differentiates it from standard normalization techniques and prevents direct integration into existing deep learning frameworks optimized for traditional convolution operations. Our method has a universal property, making it applicable to any deep learning task involving convolutional layers. By inherently normalizing within the convolution process, it serves as a convolutional adaptation of Self-Normalizing Networks, maintaining their core principles without requiring additional normalization layers. Notably, in micro-batch training scenarios, it consistently outperforms other batch-independent normalization methods. This performance boost arises from standardizing the rows of the im2col matrix, which theoretically leads to a smoother loss gradient and improved training stability.
