Activations Through Extensions: A Framework To Boost Performance Of Neural Networks
Chandramouli Kamanchi, Sumanta Mukherjee, Kameshwaran Sampath, Pankaj Dayama, Arindam Jati, Vijay Ekambaram, Dzung Phan
TL;DR
The paper addresses the rigidity of fixed activation functions by introducing an activation-extension framework that enlarges the hypothesis space of neural networks. It formalizes extensions as functions $F$ that agree with a base activation on a subset, proving that extensions can yield better data fit while incurring minimal overhead; it then introduces learnable linear (LLA) and quadratic (QLA) activations as concrete instantiations. The authors provide theoretical properties, discuss time/space costs, and validate the approach with extensive experiments on synthetic benchmarks and real-world time-series datasets (ETT), showing that QLA often outperforms vanilla activations and LLA, with learned activations adapting to task structure. The framework offers a principled method to boost performance across domains, guiding activation design and highlighting practical considerations like initialization and library size for learnable activations.
Abstract
Activation functions are non-linearities in neural networks that allow them to learn complex mapping between inputs and outputs. Typical choices for activation functions are ReLU, Tanh, Sigmoid etc., where the choice generally depends on the application domain. In this work, we propose a framework/strategy that unifies several works on activation functions and theoretically explains the performance benefits of these works. We also propose novel techniques that originate from the framework and allow us to obtain ``extensions'' (i.e. special generalizations of a given neural network) of neural networks through operations on activation functions. We theoretically and empirically show that ``extensions'' of neural networks have performance benefits compared to vanilla neural networks with insignificant space and time complexity costs on standard test functions. We also show the benefits of neural network ``extensions'' in the time-series domain on real-world datasets.
