APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

Barathi Subramanian; Rathinaraja Jeyaraj; Rakhmonov Akhrorjon Akhmadjon Ugli

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

Barathi Subramanian, Rathinaraja Jeyaraj, Rakhmonov Akhrorjon Akhmadjon Ugli

TL;DR

APALU introduces a trainable, adaptive activation function with a two-branch piecewise form governed by $a$ and $b$ that enables dynamic data fitting while preserving differentiability and monotonicity. The paper establishes strong theoretical properties—representation power, convergence behavior, vanishing gradient robustness, and universal approximation capabilities—and demonstrates empirical gains across image classification (MNIST/CIFAR-10), anomaly detection (MVTec AD), sign language recognition, and regression/stock forecasting tasks. Extensive experiments show APALU outperforms ReLU, LReLU, ELU, and GELU across diverse architectures and datasets, with task-specific tuning of $a$ and $b$. The work highlights APALU as a practical, robust activation option for broad deep learning applications, while acknowledging hyperparameter sensitivity and potential computational trade-offs as areas for further research.

Abstract

Activation function is a pivotal component of deep learning, facilitating the extraction of intricate data patterns. While classical activation functions like ReLU and its variants are extensively utilized, their static nature and simplicity, despite being advantageous, often limit their effectiveness in specialized tasks. The trainable activation functions also struggle sometimes to adapt to the unique characteristics of the data. Addressing these limitations, we introduce a novel trainable activation function, adaptive piecewise approximated activation linear unit (APALU), to enhance the learning performance of deep learning across a broad range of tasks. It presents a unique set of features that enable it to maintain stability and efficiency in the learning process while adapting to complex data representations. Experiments reveal significant improvements over widely used activation functions for different tasks. In image classification, APALU increases MobileNet and GoogleNet accuracy by 0.37% and 0.04%, respectively, on the CIFAR10 dataset. In anomaly detection, it improves the average area under the curve of One-CLASS Deep SVDD by 0.8% on the MNIST dataset, 1.81% and 1.11% improvements with DifferNet, and knowledge distillation, respectively, on the MVTech dataset. Notably, APALU achieves 100% accuracy on a sign language recognition task with a limited dataset. For regression tasks, APALU enhances the performance of deep neural networks and recurrent neural networks on different datasets. These improvements highlight the robustness and adaptability of APALU across diverse deep-learning applications.

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

TL;DR

APALU introduces a trainable, adaptive activation function with a two-branch piecewise form governed by

and

that enables dynamic data fitting while preserving differentiability and monotonicity. The paper establishes strong theoretical properties—representation power, convergence behavior, vanishing gradient robustness, and universal approximation capabilities—and demonstrates empirical gains across image classification (MNIST/CIFAR-10), anomaly detection (MVTec AD), sign language recognition, and regression/stock forecasting tasks. Extensive experiments show APALU outperforms ReLU, LReLU, ELU, and GELU across diverse architectures and datasets, with task-specific tuning of

and

. The work highlights APALU as a practical, robust activation option for broad deep learning applications, while acknowledging hyperparameter sensitivity and potential computational trade-offs as areas for further research.

Abstract

Paper Structure (18 sections, 4 equations, 4 figures, 7 tables)

This paper contains 18 sections, 4 equations, 4 figures, 7 tables.

Introduction
Related Works
Adaptive Piecewise Approximated Activation Linear Unit (APALU)
Representation capability of APALU
Convergence rate in learning
Vanishing gradient robustness
Approximation capability
Experiments
Image Classification
MNIST:
CIFAR10:
Parameter ($a$ and $b$) for MNIST and CIFAR10:
Anomaly Detection
Observations on DifferNet and KD network models:
Observations on PaDiM and PatchCore model:
...and 3 more sections

Figures (4)

Figure 1: APALU ($a$=0.55 and $b$=0.065), GELU, ReLU, ELU ($\alpha$ = 1), and LReLU (0.1)
Figure 2: Anomaly localization for PaDiM_baseline(right) and PaDiM_APALU (left) models on MVTec AD dataset.
Figure 3: Training loss and accuracy with different activation functions on MOPGRU model.
Figure 4: Training loss and test accuracy for MOPGRU model with different activation functions for sign language recognition task.

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

TL;DR

Abstract

APALU: A Trainable, Adaptive Activation Function for Deep Learning Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)