LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

Kihyuk Yoon; Chiehyeon Lim

LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

Kihyuk Yoon, Chiehyeon Lim

TL;DR

LayerAct shifts non-linearity from element-level activations to a layer-level activation scale computed from layer-normalized inputs, addressing the saturation-mean trade-off and noise-robustness limitations of traditional activations. It introduces LA-SiLU and LA-HardSiLU, where $a_i = y_i s(n_i)$ and $n_i = (y_i - \mu_y)/\sqrt{\sigma_y^2 + \alpha}$, enabling a zero-like activation mean while reducing sensitivity to input shifts. Theoretical analysis shows LayerAct lowers the upper bound on activation fluctuation $\| g(\hat{y}) - g(y) \|$ relative to element-level activations, and empirical results on MNIST, CIFAR-10/100, and ImageNet demonstrate improved robustness to noise with competitive performance on clean data. This layer-level activation framework integrates with common normalization schemes and points to future work on bounded LayerAct variants and broader architectural deployment.

Abstract

In this work, we propose a novel activation mechanism called LayerAct for CNNs. This approach is motivated by our theoretical and experimental analyses, which demonstrate that Layer Normalization (LN) can mitigate a limitation of existing activation functions regarding noise robustness. However, LN is known to be disadvantageous in CNNs due to its tendency to make activation outputs homogeneous. The proposed method is designed to be more robust than existing activation functions by reducing the upper bound of influence caused by input shifts without inheriting LN's limitation. We provide analyses and experiments showing that LayerAct functions exhibit superior robustness compared to ElementAct functions. Experimental results on three clean and noisy benchmark datasets for image classification tasks indicate that LayerAct functions outperform other activation functions in handling noisy datasets while achieving superior performance on clean datasets in most cases.

LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

TL;DR

and

, enabling a zero-like activation mean while reducing sensitivity to input shifts. Theoretical analysis shows LayerAct lowers the upper bound on activation fluctuation

relative to element-level activations, and empirical results on MNIST, CIFAR-10/100, and ImageNet demonstrate improved robustness to noise with competitive performance on clean data. This layer-level activation framework integrates with common normalization schemes and points to future work on bounded LayerAct variants and broader architectural deployment.

Abstract

Paper Structure (27 sections, 19 equations, 12 figures, 12 tables)

This paper contains 27 sections, 19 equations, 12 figures, 12 tables.

Introduction
Background
Activation scale
Trade-off between saturation and zero-like mean activation
Large variance of noise-robustness across samples
Layer Normalization
Layer-level activation
LayerAct mechanism
Properties of LayerAct
Noise-robustness of LayerAct
Experiment
Experimental analysis on MNIST
Zero-like mean activation
Noise-robustness
Classification performance
...and 12 more sections

Figures (12)

Figure 1: The mechanisms of the element-level activation (left) and proposed layer-level activation (right).
Figure 2: Distribution of the activation output means of the elements in a trained network on MNIST at $1$ and $40$ epochs. The distributions did not change after $40$ epochs. The LayerAct functions maintain zero-like mean activation for all epochs.
Figure 3: Distribution of activation output fluctuation due to noise with different noise distribution. The activation fluctuation of the LayerAct functions have lower mean and variance than those of the other element-level activation functions in both cases.
Figure 4: Clean and noisy car images of the CIFAR10 dataset. From left to right, the images are a clean image, an image with the Gaussian distributed noise, an image with Possion distributed noise, and a Gaussian blurred image.
Figure 5: LA-SiLU with different mean and variance value in the input. The distribution of the activation input is: i) $\mu_{y}=0$, $\sigma_{y}=1$, ii) $\mu_{y}=0$, $\sigma_{y}=5$, iii) $\mu_{y}=-5$, $\sigma_{y}=1$, and iv) $\mu_{y}=5$, $\sigma_{y}=1$ from the left to right.
...and 7 more figures

Theorems & Definitions (5)

Definition 2.1: Saturation state of activation functions with activation scale functions
Definition 2.2: Activation fluctuation
Definition 2.3: Activation fluctuation of element-level activation functions
Definition 3.1: Activation scale function for LayerAct functions
Definition 3.2: Activation fluctuation of LayerAct functions

LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

TL;DR

Abstract

LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

Authors

TL;DR

Abstract

Table of Contents

Figures (12)

Theorems & Definitions (5)