Table of Contents
Fetching ...

Fractional Concepts in Neural Networks: Enhancing Activation Functions

Zahra Alijani, Vojtech Molek

TL;DR

The paper addresses limitations of fixed activation functions by introducing trainable fractional order derivatives (FDO) into activations. It develops fractional variants FGELU, FMish, FSig, and FALU using the Grünwald-Letnikov derivative and analyzes hyperparameters $N$ and $h$ (with $h=\frac{1}{\max(1,N-1)}$) affecting computation and memory. Through extensive, seeded experiments on ResNet architectures with CIFAR-10 and EfficientNet-B0 with ImageNet-1K (and related datasets), FSig shows consistent improvements in several setups, while FGELU, FMish, and FALU exhibit mixed or limited gains and potential training challenges. The study highlights linear-ish growth in time and memory with the number of terms and fractional order, emphasizing the need for efficiency optimizations and robust training dynamics; code for reproducibility is made publicly available.

Abstract

Designing effective neural networks requires tuning architectural elements. This study integrates fractional calculus into neural networks by introducing fractional order derivatives (FDO) as tunable parameters in activation functions, allowing diverse activation functions by adjusting the FDO. We evaluate these fractional activation functions on various datasets and network architectures, comparing their performance with traditional and new activation functions. Our experiments assess their impact on accuracy, time complexity, computational overhead, and memory usage. Results suggest fractional activation functions, particularly fractional Sigmoid, offer benefits in some scenarios. Challenges related to consistency and efficiency remain. Practical implications and limitations are discussed.

Fractional Concepts in Neural Networks: Enhancing Activation Functions

TL;DR

The paper addresses limitations of fixed activation functions by introducing trainable fractional order derivatives (FDO) into activations. It develops fractional variants FGELU, FMish, FSig, and FALU using the Grünwald-Letnikov derivative and analyzes hyperparameters and (with ) affecting computation and memory. Through extensive, seeded experiments on ResNet architectures with CIFAR-10 and EfficientNet-B0 with ImageNet-1K (and related datasets), FSig shows consistent improvements in several setups, while FGELU, FMish, and FALU exhibit mixed or limited gains and potential training challenges. The study highlights linear-ish growth in time and memory with the number of terms and fractional order, emphasizing the need for efficiency optimizations and robust training dynamics; code for reproducibility is made publicly available.

Abstract

Designing effective neural networks requires tuning architectural elements. This study integrates fractional calculus into neural networks by introducing fractional order derivatives (FDO) as tunable parameters in activation functions, allowing diverse activation functions by adjusting the FDO. We evaluate these fractional activation functions on various datasets and network architectures, comparing their performance with traditional and new activation functions. Our experiments assess their impact on accuracy, time complexity, computational overhead, and memory usage. Results suggest fractional activation functions, particularly fractional Sigmoid, offer benefits in some scenarios. Challenges related to consistency and efficiency remain. Practical implications and limitations are discussed.
Paper Structure (12 sections, 16 equations, 8 figures, 2 tables)

This paper contains 12 sections, 16 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Fractional activation functions visualized. The functions in a row-major order are fractional Mish, fractional GELU, fractional sigmoid, and FALU zamora2022fractional. Graph lines represent original functions (0.0 line) and their fractional derivations.
  • Figure 2: FALU and its fractional derivatives with fixed $\beta=1$. The left graph shows FALU with our fix, notice smooth transition between the 0.9 and 1.1 derivative. The graph on the right shows derivatives with the original FALU formulation. Notice how 1.1 derivative is approximately 2 derivative.
  • Figure 3: Fractional sigmoid with matched and mismatched $N$$h$ pair. Left: $N=2$ and $h=0.5$. Right: $N=3$ and $h=0.5$.
  • Figure 4: Test accuracies of the fractional sigmoid, Mish, and GELU in a first, second, and third row respectively. The results in the left column are obtained by training ResNet-20 on 50% of the CIFAR-10 training set. The results in the right column are obtained by training EfficientNet-B0 on 10% of ImageNet-1K training set.
  • Figure 5: Distribution of FDO at the beginning and the end of the training.
  • ...and 3 more figures