Fractional Concepts in Neural Networks: Enhancing Activation Functions
Zahra Alijani, Vojtech Molek
TL;DR
The paper addresses limitations of fixed activation functions by introducing trainable fractional order derivatives (FDO) into activations. It develops fractional variants FGELU, FMish, FSig, and FALU using the Grünwald-Letnikov derivative and analyzes hyperparameters $N$ and $h$ (with $h=\frac{1}{\max(1,N-1)}$) affecting computation and memory. Through extensive, seeded experiments on ResNet architectures with CIFAR-10 and EfficientNet-B0 with ImageNet-1K (and related datasets), FSig shows consistent improvements in several setups, while FGELU, FMish, and FALU exhibit mixed or limited gains and potential training challenges. The study highlights linear-ish growth in time and memory with the number of terms and fractional order, emphasizing the need for efficiency optimizations and robust training dynamics; code for reproducibility is made publicly available.
Abstract
Designing effective neural networks requires tuning architectural elements. This study integrates fractional calculus into neural networks by introducing fractional order derivatives (FDO) as tunable parameters in activation functions, allowing diverse activation functions by adjusting the FDO. We evaluate these fractional activation functions on various datasets and network architectures, comparing their performance with traditional and new activation functions. Our experiments assess their impact on accuracy, time complexity, computational overhead, and memory usage. Results suggest fractional activation functions, particularly fractional Sigmoid, offer benefits in some scenarios. Challenges related to consistency and efficiency remain. Practical implications and limitations are discussed.
