DSReLU: A Novel Dynamic Slope Function for Superior Model Training
Archisman Chakraborti, Bidyut B Chaudhuri
TL;DR
The paper introduces DSReLU, a dynamic-slope activation that evolves with training to address vanishing gradients and overfitting in vision models. By defining f(x;t) with a time-dependent slope s(t) that transitions from a steep to a gentler regime via a logistic function, the approach aims to preserve learning speed early on while stabilizing later. Empirical results on Mini-ImageNet, CIFAR-100, and MIT-BIH using a ResNet-34 backbone show DSReLU achieving higher validation accuracy and F1-scores than ReLU, Mish, and other baselines, with competitive training times. The work suggests DSReLU as a promising activation design for improved learning dynamics and generalization, and outlines future work on broader architectures and parameter exploration.
Abstract
This study introduces a novel activation function, characterized by a dynamic slope that adjusts throughout the training process, aimed at enhancing adaptability and performance in deep neural networks for computer vision tasks. The rationale behind this approach is to overcome limitations associated with traditional activation functions, such as ReLU, by providing a more flexible mechanism that can adapt to different stages of the learning process. Evaluated on the Mini-ImageNet, CIFAR-100, and MIT-BIH datasets, our method demonstrated improvements in classification metrics and generalization capabilities. These results suggest that our dynamic slope activation function could offer a new tool for improving the performance of deep learning models in various image recognition tasks.
