Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

Taeyoung Kim; Myungjoo Kang

Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

Taeyoung Kim, Myungjoo Kang

TL;DR

The paper analyzes why deep Rectified Power Unit (RePU) networks fail under training through an effective field theory lens, revealing that RePU does not satisfy a criticality condition in the representation-group flow. It introduces Modified Rectified Power Unit (MRePU), derives its susceptibility properties, and proves that MRePU resides in a distinct universality class with stable forward propagation and differentiable, universal approximation capabilities. The authors provide rigorous approximation results for polynomials and differentiable functions, and validate MRePU across synthetic, physics-informed neural network (PINN), and real-world vision tasks (MNIST, CIFAR-10), including integration with ResNet. The work offers concrete kernel-based guidelines and phase-diagram evidence for initializing MRePU networks to achieve robust training, highlighting its practical impact as a robust alternative activation in deep networks.

Abstract

The Rectified Power Unit (RePU) activation function, a differentiable generalization of the Rectified Linear Unit (ReLU), has shown promise in constructing neural networks due to its smoothness properties. However, deep RePU networks often suffer from critical issues such as vanishing or exploding values during training, rendering them unstable regardless of hyperparameter initialization. Leveraging the perspective of effective field theory, we identify the root causes of these failures and propose the Modified Rectified Power Unit (MRePU) activation function. MRePU addresses RePU's limitations while preserving its advantages, such as differentiability and universal approximation properties. Theoretical analysis demonstrates that MRePU satisfies criticality conditions necessary for stable training, placing it in a distinct universality class. Extensive experiments validate the effectiveness of MRePU, showing significant improvements in training stability and performance across various tasks, including polynomial regression, physics-informed neural networks (PINNs) and real-world vision tasks. Our findings highlight the potential of MRePU as a robust alternative for building deep neural networks.

Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

TL;DR

Abstract

Paper Structure (20 sections, 49 equations, 19 figures, 6 tables)

This paper contains 20 sections, 49 equations, 19 figures, 6 tables.

Introduction
Activation Functions
Rectified Power Unit
Effective Field Theory of Neural Networks
Our Contribution
Preliminary
Overview of Neural Networks and Activation Functions
Effective Field Theory for Neural Networks
Failure of RePU Activation
Susceptibility Calculation
Experimental Validation
Modified Rectified Power Unit (MRePU)
Susceptibility Calculation
Numerics.
Experimental Results
...and 5 more sections

Figures (19)

Figure 1: Empirical Kernels at Initialization Across Layers for RePU Activation with $p=2$. Left: Data is fixed and randomness is in the weight parameters. Right: Weight parameters are fixed and data is random. Each line represents a sample.
Figure 2: The evolution of the mean of empirical kernels over an ensemble of 100 models for $x_{0} = (1,0)$ as training progresses for the RePU activation with $p=2$. The shaded areas represent the region between $\log_{10}(\text{mean}\pm 0.1 \times \text{standard deviation})$.
Figure 3: Mean outputs over an ensemble of 100 models versus target values on random test data. The shaded areas represent the region of 1 standard deviation. Top left: hidden layers ($N_{h}$)= 1, Top right: $N_{h}$ = 3, Bottom: $N_{h}$ = 5.
Figure 4: Parallel and perpendicular susceptibilities and their ratio for MRePU with $p=2$ (left) and $p=3$ (right), with $(C_{W},C_{b})=(1,0)$. The ratio tends to $1$ as $K\to0$.
Figure 5: Empirical Kernels at Initialization Across Layers for MRePU Activation with $p=2$. Left: Data is fixed and randomness is in the weight parameters. Right: Weight parameters are fixed and data is random. Each line represents a sample.
...and 14 more figures

Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

TL;DR

Abstract

Why Rectified Power Unit Networks Fail and How to Improve It: An Effective Field Theory Perspective

Authors

TL;DR

Abstract

Table of Contents

Figures (19)