Table of Contents
Fetching ...

LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

Kihyuk Yoon, Chiehyeon Lim

TL;DR

LayerAct shifts non-linearity from element-level activations to a layer-level activation scale computed from layer-normalized inputs, addressing the saturation-mean trade-off and noise-robustness limitations of traditional activations. It introduces LA-SiLU and LA-HardSiLU, where $a_i = y_i s(n_i)$ and $n_i = (y_i - \mu_y)/\sqrt{\sigma_y^2 + \alpha}$, enabling a zero-like activation mean while reducing sensitivity to input shifts. Theoretical analysis shows LayerAct lowers the upper bound on activation fluctuation $\| g(\hat{y}) - g(y) \|$ relative to element-level activations, and empirical results on MNIST, CIFAR-10/100, and ImageNet demonstrate improved robustness to noise with competitive performance on clean data. This layer-level activation framework integrates with common normalization schemes and points to future work on bounded LayerAct variants and broader architectural deployment.

Abstract

In this work, we propose a novel activation mechanism called LayerAct for CNNs. This approach is motivated by our theoretical and experimental analyses, which demonstrate that Layer Normalization (LN) can mitigate a limitation of existing activation functions regarding noise robustness. However, LN is known to be disadvantageous in CNNs due to its tendency to make activation outputs homogeneous. The proposed method is designed to be more robust than existing activation functions by reducing the upper bound of influence caused by input shifts without inheriting LN's limitation. We provide analyses and experiments showing that LayerAct functions exhibit superior robustness compared to ElementAct functions. Experimental results on three clean and noisy benchmark datasets for image classification tasks indicate that LayerAct functions outperform other activation functions in handling noisy datasets while achieving superior performance on clean datasets in most cases.

LayerAct: Advanced Activation Mechanism for Robust Inference of CNNs

TL;DR

LayerAct shifts non-linearity from element-level activations to a layer-level activation scale computed from layer-normalized inputs, addressing the saturation-mean trade-off and noise-robustness limitations of traditional activations. It introduces LA-SiLU and LA-HardSiLU, where and , enabling a zero-like activation mean while reducing sensitivity to input shifts. Theoretical analysis shows LayerAct lowers the upper bound on activation fluctuation relative to element-level activations, and empirical results on MNIST, CIFAR-10/100, and ImageNet demonstrate improved robustness to noise with competitive performance on clean data. This layer-level activation framework integrates with common normalization schemes and points to future work on bounded LayerAct variants and broader architectural deployment.

Abstract

In this work, we propose a novel activation mechanism called LayerAct for CNNs. This approach is motivated by our theoretical and experimental analyses, which demonstrate that Layer Normalization (LN) can mitigate a limitation of existing activation functions regarding noise robustness. However, LN is known to be disadvantageous in CNNs due to its tendency to make activation outputs homogeneous. The proposed method is designed to be more robust than existing activation functions by reducing the upper bound of influence caused by input shifts without inheriting LN's limitation. We provide analyses and experiments showing that LayerAct functions exhibit superior robustness compared to ElementAct functions. Experimental results on three clean and noisy benchmark datasets for image classification tasks indicate that LayerAct functions outperform other activation functions in handling noisy datasets while achieving superior performance on clean datasets in most cases.
Paper Structure (27 sections, 19 equations, 12 figures, 12 tables)

This paper contains 27 sections, 19 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: The mechanisms of the element-level activation (left) and proposed layer-level activation (right).
  • Figure 2: Distribution of the activation output means of the elements in a trained network on MNIST at $1$ and $40$ epochs. The distributions did not change after $40$ epochs. The LayerAct functions maintain zero-like mean activation for all epochs.
  • Figure 3: Distribution of activation output fluctuation due to noise with different noise distribution. The activation fluctuation of the LayerAct functions have lower mean and variance than those of the other element-level activation functions in both cases.
  • Figure 4: Clean and noisy car images of the CIFAR10 dataset. From left to right, the images are a clean image, an image with the Gaussian distributed noise, an image with Possion distributed noise, and a Gaussian blurred image.
  • Figure 5: LA-SiLU with different mean and variance value in the input. The distribution of the activation input is: i) $\mu_{y}=0$, $\sigma_{y}=1$, ii) $\mu_{y}=0$, $\sigma_{y}=5$, iii) $\mu_{y}=-5$, $\sigma_{y}=1$, and iv) $\mu_{y}=5$, $\sigma_{y}=1$ from the left to right.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Definition 2.1: Saturation state of activation functions with activation scale functions
  • Definition 2.2: Activation fluctuation
  • Definition 2.3: Activation fluctuation of element-level activation functions
  • Definition 3.1: Activation scale function for LayerAct functions
  • Definition 3.2: Activation fluctuation of LayerAct functions