LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs
Kihyuk Yoon, Chiehyeon Lim
TL;DR
LayerAct shifts non-linearity from element-level activations to a layer-level activation scale computed from layer-normalized inputs, addressing the saturation-mean trade-off and noise-robustness limitations of traditional activations. It introduces LA-SiLU and LA-HardSiLU, where $a_i = y_i s(n_i)$ and $n_i = (y_i - \mu_y)/\sqrt{\sigma_y^2 + \alpha}$, enabling a zero-like activation mean while reducing sensitivity to input shifts. Theoretical analysis shows LayerAct lowers the upper bound on activation fluctuation $\| g(\hat{y}) - g(y) \|$ relative to element-level activations, and empirical results on MNIST, CIFAR-10/100, and ImageNet demonstrate improved robustness to noise with competitive performance on clean data. This layer-level activation framework integrates with common normalization schemes and points to future work on bounded LayerAct variants and broader architectural deployment.
Abstract
In this work, we propose a novel activation mechanism called LayerAct for CNNs. This approach is motivated by our theoretical and experimental analyses, which demonstrate that Layer Normalization (LN) can mitigate a limitation of existing activation functions regarding noise robustness. However, LN is known to be disadvantageous in CNNs due to its tendency to make activation outputs homogeneous. The proposed method is designed to be more robust than existing activation functions by reducing the upper bound of influence caused by input shifts without inheriting LN's limitation. We provide analyses and experiments showing that LayerAct functions exhibit superior robustness compared to ElementAct functions. Experimental results on three clean and noisy benchmark datasets for image classification tasks indicate that LayerAct functions outperform other activation functions in handling noisy datasets while achieving superior performance on clean datasets in most cases.
