Table of Contents
Fetching ...

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

Farhad Pourkamali-Anaraki, Tahamina Nasrin, Robert E. Jensen, Amy M. Peterson, Christopher J. Hansen

TL;DR

The paper tackles predictive modeling with sparse experimental data by studying adaptive activation functions in neural networks, focusing on per-unit versus shared parameters for ELU, Softplus, and Swish within small, single-hidden-layer networks across three additive-manufacturing testbeds. It combines standard accuracy metrics with conformal prediction to quantify predictive uncertainty, demonstrating that fully adaptive activations with per-unit parameters (M3) consistently outperform fixed and shared-parameter schemes. The findings highlight that allowing each hidden unit to learn its own nonlinearity can substantially improve both accuracy and confidence in data-scarce scientific problems, with implications for design optimization in engineering workflows. The authors also provide open-source code to facilitate broader adoption of adaptive activation functions in low-data contexts.

Abstract

A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.

Adaptive Activation Functions for Predictive Modeling with Sparse Experimental Data

TL;DR

The paper tackles predictive modeling with sparse experimental data by studying adaptive activation functions in neural networks, focusing on per-unit versus shared parameters for ELU, Softplus, and Swish within small, single-hidden-layer networks across three additive-manufacturing testbeds. It combines standard accuracy metrics with conformal prediction to quantify predictive uncertainty, demonstrating that fully adaptive activations with per-unit parameters (M3) consistently outperform fixed and shared-parameter schemes. The findings highlight that allowing each hidden unit to learn its own nonlinearity can substantially improve both accuracy and confidence in data-scarce scientific problems, with implications for design optimization in engineering workflows. The authors also provide open-source code to facilitate broader adoption of adaptive activation functions in low-data contexts.

Abstract

A pivotal aspect in the design of neural networks lies in selecting activation functions, crucial for introducing nonlinear structures that capture intricate input-output patterns. While the effectiveness of adaptive or trainable activation functions has been studied in domains with ample data, like image classification problems, significant gaps persist in understanding their influence on classification accuracy and predictive uncertainty in settings characterized by limited data availability. This research aims to address these gaps by investigating the use of two types of adaptive activation functions. These functions incorporate shared and individual trainable parameters per hidden layer and are examined in three testbeds derived from additive manufacturing problems containing fewer than one hundred training instances. Our investigation reveals that adaptive activation functions, such as Exponential Linear Unit (ELU) and Softplus, with individual trainable parameters, result in accurate and confident prediction models that outperform fixed-shape activation functions and the less flexible method of using identical trainable activation functions in a hidden layer. Therefore, this work presents an elegant way of facilitating the design of adaptive neural networks in scientific and engineering problems.
Paper Structure (9 sections, 7 equations, 7 figures)

This paper contains 9 sections, 7 equations, 7 figures.

Figures (7)

  • Figure 1: Visualizing the impact of the "trainable" parameter $\alpha$ on modifying the structure of three widely-used activation functions: Exponential Linear Unit (ELU), Softplus, and Swish. The default value for $\alpha$ in fixed activation functions is commonly set to $1$.
  • Figure 2: Demonstrating the spectrum of flexibility within the neural network models under examination. M1 denotes the conventional fixed activation functions, while M2 permits a single trainable parameter for the hidden layer. Conversely, M3 offers the utmost flexibility by assigning an individual trainable parameter to each unit in the hidden layer. To implement M3, we employ the Keras Functional API, connecting each hidden unit to the input layer and subsequently concatenating their outputs to form a unified hidden layer. While we fix the number of hidden units $N_h=2$, the number of units in the input and output layers are determined by the characteristics of the labeled data specific to each additive manufacturing problem that we consider in this paper.
  • Figure 3: Employing the filament selection problem as a benchmark, we assess the performance of M1, M2, and M3 using three evaluation metrics. Classification accuracy denotes the fraction of correct predictions on the test data set, while empirical coverage and uncertainty score are derived from prediction sets within the conformal prediction framework using $\delta=0.1$. We note that M3 demonstrates superior performance compared to M1 and M2. Notably, the worst-case classification accuracy score produced by M3 is comparable to the median score attained by both M1 and M2.
  • Figure 4: Employing the printer selection problem as a benchmark, we assess the performance of M1, M2, and M3 using three evaluation metrics. Using the trainable ELU activation function with individual parameters in M3 yields the highest classification accuracy and empirical coverage scores. According to the information from the third row, all prediction sets consist of a single class, except for the fixed Softplus activation function in M1.
  • Figure 5: Employing the printability prediction problem as a benchmark, we assess the performance of M1, M2, and M3 using three evaluation metrics. Using the trainable ELU and Softplus activation functions with individual parameters in M3 yields the highest classification accuracy and empirical coverage scores. However, it is worth noting that the minimum classification accuracy score for Swish in M3 is $0.37$, which does not meet the threshold for a random classifier.
  • ...and 2 more figures