Table of Contents
Fetching ...

DSReLU: A Novel Dynamic Slope Function for Superior Model Training

Archisman Chakraborti, Bidyut B Chaudhuri

TL;DR

The paper introduces DSReLU, a dynamic-slope activation that evolves with training to address vanishing gradients and overfitting in vision models. By defining f(x;t) with a time-dependent slope s(t) that transitions from a steep to a gentler regime via a logistic function, the approach aims to preserve learning speed early on while stabilizing later. Empirical results on Mini-ImageNet, CIFAR-100, and MIT-BIH using a ResNet-34 backbone show DSReLU achieving higher validation accuracy and F1-scores than ReLU, Mish, and other baselines, with competitive training times. The work suggests DSReLU as a promising activation design for improved learning dynamics and generalization, and outlines future work on broader architectures and parameter exploration.

Abstract

This study introduces a novel activation function, characterized by a dynamic slope that adjusts throughout the training process, aimed at enhancing adaptability and performance in deep neural networks for computer vision tasks. The rationale behind this approach is to overcome limitations associated with traditional activation functions, such as ReLU, by providing a more flexible mechanism that can adapt to different stages of the learning process. Evaluated on the Mini-ImageNet, CIFAR-100, and MIT-BIH datasets, our method demonstrated improvements in classification metrics and generalization capabilities. These results suggest that our dynamic slope activation function could offer a new tool for improving the performance of deep learning models in various image recognition tasks.

DSReLU: A Novel Dynamic Slope Function for Superior Model Training

TL;DR

The paper introduces DSReLU, a dynamic-slope activation that evolves with training to address vanishing gradients and overfitting in vision models. By defining f(x;t) with a time-dependent slope s(t) that transitions from a steep to a gentler regime via a logistic function, the approach aims to preserve learning speed early on while stabilizing later. Empirical results on Mini-ImageNet, CIFAR-100, and MIT-BIH using a ResNet-34 backbone show DSReLU achieving higher validation accuracy and F1-scores than ReLU, Mish, and other baselines, with competitive training times. The work suggests DSReLU as a promising activation design for improved learning dynamics and generalization, and outlines future work on broader architectures and parameter exploration.

Abstract

This study introduces a novel activation function, characterized by a dynamic slope that adjusts throughout the training process, aimed at enhancing adaptability and performance in deep neural networks for computer vision tasks. The rationale behind this approach is to overcome limitations associated with traditional activation functions, such as ReLU, by providing a more flexible mechanism that can adapt to different stages of the learning process. Evaluated on the Mini-ImageNet, CIFAR-100, and MIT-BIH datasets, our method demonstrated improvements in classification metrics and generalization capabilities. These results suggest that our dynamic slope activation function could offer a new tool for improving the performance of deep learning models in various image recognition tasks.
Paper Structure (20 sections, 5 equations, 5 figures, 4 tables)

This paper contains 20 sections, 5 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Slope change of DSReLU with parameter $k$
  • Figure 2: Performance of different activation functions on the Mini-ImageNet Dataset
  • Figure 3: Performance of different activation functions on the CIFAR100 Dataset
  • Figure 4: Performance of different activation functions on the MIT-BIH Dataset
  • Figure 5: Mean training time of ResNet-34 model with different activation functions across different datasets.