Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data
Farhad Pourkamali-Anaraki, Tahamina Nasrin, Robert E. Jensen, Amy M. Peterson, Christopher J. Hansen
TL;DR
The paper tackles predictive modeling with sparse experimental data by studying adaptive activation functions in neural networks, focusing on per-unit versus shared parameters for ELU, Softplus, and Swish within small, single-hidden-layer networks across three additive-manufacturing testbeds. It combines standard accuracy metrics with conformal prediction to quantify predictive uncertainty, demonstrating that fully adaptive activations with per-unit parameters (M3) consistently outperform fixed and shared-parameter schemes. The findings highlight that allowing each hidden unit to learn its own nonlinearity can substantially improve both accuracy and confidence in data-scarce scientific problems, with implications for design optimization in engineering workflows. The authors also provide open-source code to facilitate broader adoption of adaptive activation functions in low-data contexts.
Abstract
A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.
