Enhancing Sequential Model Performance with Squared Sigmoid TanH (SST) Activation Under Data Constraints
Barathi Subramanian, Rathinaraja Jeyaraj, Anand Paul
TL;DR
The paper addresses the difficulty of modeling sparse temporal patterns with vanishing gradients in sequential models trained on small datasets. It proposes SST, which squares the outputs of Sigmoid and TanH activations to amplify strong signals and preserve gradient flow, improving memory and representation across time. Across diverse tasks—sign language recognition, gait classification, human activity recognition, and gold-price forecasting—SST-enhanced GRUs/LSTMs achieve higher test accuracy, improved AUC, and lower MSE compared with baselines, demonstrating broad applicability under data constraints. The approach offers a general, activation-level modification that improves learning efficiency and predictive performance in real-world temporal applications.
Abstract
Activation functions enable neural networks to learn complex representations by introducing non-linearities. While feedforward models commonly use rectified linear units, sequential models like recurrent neural networks, long short-term memory (LSTMs) and gated recurrent units (GRUs) still rely on Sigmoid and TanH activation functions. However, these classical activation functions often struggle to model sparse patterns when trained on small sequential datasets to effectively capture temporal dependencies. To address this limitation, we propose squared Sigmoid TanH (SST) activation specifically tailored to enhance the learning capability of sequential models under data constraints. SST applies mathematical squaring to amplify differences between strong and weak activations as signals propagate over time, facilitating improved gradient flow and information filtering. We evaluate SST-powered LSTMs and GRUs for diverse applications, such as sign language recognition, regression, and time-series classification tasks, where the dataset is limited. Our experiments demonstrate that SST models consistently outperform RNN-based models with baseline activations, exhibiting improved test accuracy.
