A Method on Searching Better Activation Functions

Haoyuan Sun; Zihao Wu; Bo Xia; Pu Chang; Zibin Dong; Yifu Yuan; Yongzhe Chang; Xueqian Wang

A Method on Searching Better Activation Functions

Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan, Yongzhe Chang, Xueqian Wang

TL;DR

This work addresses the lack of a theoretical foundation for activation-function design by linking information entropy to Bayesian error rate and deriving a worst-activation-function form under boundary conditions. It then introduces the Entropy-based Activation Function Optimization (EAFO) framework and derives Correction Regularized ReLU (CRReLU) from ReLU, combining a theoretically motivated correction with a learnable parameter. Empirically, CRReLU improves performance over common AFs on Vision Transformer variants for CIFAR and ImageNet and shows advantages in LLM fine-tuning with DPO, indicating broad practical potential. The study provides a principled route to AF design and highlights future avenues for dynamic activation optimization and application to non-invertible functions.

Abstract

The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation functions in deep neural networks and the potential of dynamically optimizing activation during iterative training. Utilizing EAFO methodology, we derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU). Experiments conducted with vision transformer and its variants on CIFAR-10, CIFAR-100 and ImageNet-1K datasets demonstrate the superiority of CRReLU over existing corrections of ReLU. Extensive empirical studies on task of large language model (LLM) fine-tuning, CRReLU exhibits superior performance compared to GELU, suggesting its broader potential for practical applications.

A Method on Searching Better Activation Functions

TL;DR

Abstract

Paper Structure (23 sections, 2 theorems, 30 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 2 theorems, 30 equations, 3 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Motivation
Methodology
Problem Setup
Bayesian Error Rate and Information Entropy
Activation Function and Information Entropy
Worst Activation Function with Boundary Condition (WAFBC)
Entropy-based Activation Function Optimization (EAFO)
Correction Regularized ReLU (CRReLU) : From ReLU to Better
Experiments
Task of Image Classification
Task of Large Language Model (LLM) Fine-tuning
Discussion
Proof of Proposition \ref{[' Euler-Lagrange equation']}
...and 8 more sections

Key Result

Proposition 1

If $\mathbb{G}$ is independent of $x$, i.e. $\mathbb{G}=\mathbb{G}(y,y')$, based on the Euler-Lagrange equation expressed in Equation EL equation1, then we have:

Figures (3)

Figure 1: Comparison between Sigmoid and standard normal CDF
Figure 2: Comparison between Tanh and Standard Normal CDF multiplied by $e$ (has been transformed to achieve symmetry about origin)
Figure 3: CRReLU with different $\epsilon$ value

Theorems & Definitions (4)

Proposition 1
Proposition 2
proof
proof

A Method on Searching Better Activation Functions

TL;DR

Abstract

A Method on Searching Better Activation Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)