Activations Through Extensions: A Framework To Boost Performance Of Neural Networks

Chandramouli Kamanchi; Sumanta Mukherjee; Kameshwaran Sampath; Pankaj Dayama; Arindam Jati; Vijay Ekambaram; Dzung Phan

Activations Through Extensions: A Framework To Boost Performance Of Neural Networks

Chandramouli Kamanchi, Sumanta Mukherjee, Kameshwaran Sampath, Pankaj Dayama, Arindam Jati, Vijay Ekambaram, Dzung Phan

TL;DR

The paper addresses the rigidity of fixed activation functions by introducing an activation-extension framework that enlarges the hypothesis space of neural networks. It formalizes extensions as functions $F$ that agree with a base activation on a subset, proving that extensions can yield better data fit while incurring minimal overhead; it then introduces learnable linear (LLA) and quadratic (QLA) activations as concrete instantiations. The authors provide theoretical properties, discuss time/space costs, and validate the approach with extensive experiments on synthetic benchmarks and real-world time-series datasets (ETT), showing that QLA often outperforms vanilla activations and LLA, with learned activations adapting to task structure. The framework offers a principled method to boost performance across domains, guiding activation design and highlighting practical considerations like initialization and library size for learnable activations.

Abstract

Activation functions are non-linearities in neural networks that allow them to learn complex mapping between inputs and outputs. Typical choices for activation functions are ReLU, Tanh, Sigmoid etc., where the choice generally depends on the application domain. In this work, we propose a framework/strategy that unifies several works on activation functions and theoretically explains the performance benefits of these works. We also propose novel techniques that originate from the framework and allow us to obtain ``extensions'' (i.e. special generalizations of a given neural network) of neural networks through operations on activation functions. We theoretically and empirically show that ``extensions'' of neural networks have performance benefits compared to vanilla neural networks with insignificant space and time complexity costs on standard test functions. We also show the benefits of neural network ``extensions'' in the time-series domain on real-world datasets.

Activations Through Extensions: A Framework To Boost Performance Of Neural Networks

TL;DR

that agree with a base activation on a subset, proving that extensions can yield better data fit while incurring minimal overhead; it then introduces learnable linear (LLA) and quadratic (QLA) activations as concrete instantiations. The authors provide theoretical properties, discuss time/space costs, and validate the approach with extensive experiments on synthetic benchmarks and real-world time-series datasets (ETT), showing that QLA often outperforms vanilla activations and LLA, with learned activations adapting to task structure. The framework offers a principled method to boost performance across domains, guiding activation design and highlighting practical considerations like initialization and library size for learnable activations.

Abstract

Paper Structure (11 sections, 6 theorems, 8 equations, 4 figures, 3 tables)

This paper contains 11 sections, 6 theorems, 8 equations, 4 figures, 3 tables.

Introduction
Background
Feedforward Neural Networks
Some Elements of Statistical Learning Theory
Analysis
Properties
Learnable Activations
Time and Space Complexity
Experiments
Experiments on real-world time series datasets
Conclusion and Future Work

Key Result

Lemma 1

Assume that $F: \mathbb{D} \rightarrow \mathbb{R}$ is an extension of a function $f:\mathcal{D} \rightarrow \mathbb{R}$ then

Figures (4)

Figure 1: Visual comparison between ReLU and QLA for Shubert function on test dataset.
Figure 2: Depiction of level sets of MSE on the pentagon for Shubert test function
Figure 3: Actual vs forecast for typical test data sample point of HULL feature of ETTh1 dataset.
Figure 4: Plot of activations learned for the Shubert function in hidden layers $1$ and $2$ for the network under consideration.

Theorems & Definitions (17)

Definition 1
Example 1
Example 2
Example 3
Example 4
Lemma 1
proof
Lemma 2
proof
Lemma 3
...and 7 more

Activations Through Extensions: A Framework To Boost Performance Of Neural Networks

TL;DR

Abstract

Activations Through Extensions: A Framework To Boost Performance Of Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (17)