Table of Contents
Fetching ...

fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis functions

Alireza Afzal Aghaei

TL;DR

This work addresses the need for faster, more accurate, and interpretable function approximation in neural networks by integrating fractional Jacobi basis functions into Kolmogorov-Arnold Networks (KANs). It introduces the fractional Jacobi neural block (fJNB) with trainable $\alpha$, $\beta$, and $\gamma$, constraining these parameters via ELU and Sigmoid to enable a fractional degree $\gamma$ in a stable range. Through extensive experiments on synthetic regression, MNIST, Fashion-MNIST denoising, IMDB sentiment analysis, and physics-informed ODE/PDE problems, fKAN with fractional Jacobi activations consistently improves training speed and performance relative to traditional activations and standard KANs. The work delivers a flexible, tunable activation framework with potential impact on deep learning and physics-informed modeling, while noting higher time costs and interpretability considerations, and suggests exploring local fractional bases such as fractional B-splines in future work.

Abstract

Recent advancements in neural network design have given rise to the development of Kolmogorov-Arnold Networks (KANs), which enhance speed, interpretability, and precision. This paper presents the Fractional Kolmogorov-Arnold Network (fKAN), a novel neural network architecture that incorporates the distinctive attributes of KANs with a trainable adaptive fractional-orthogonal Jacobi function as its basis function. By leveraging the unique mathematical properties of fractional Jacobi functions, including simple derivative formulas, non-polynomial behavior, and activity for both positive and negative input values, this approach ensures efficient learning and enhanced accuracy. The proposed architecture is evaluated across a range of tasks in deep learning and physics-informed deep learning. Precision is tested on synthetic regression data, image classification, image denoising, and sentiment analysis. Additionally, the performance is measured on various differential equations, including ordinary, partial, and fractional delay differential equations. The results demonstrate that integrating fractional Jacobi functions into KANs significantly improves training speed and performance across diverse fields and applications.

fKAN: Fractional Kolmogorov-Arnold Networks with trainable Jacobi basis functions

TL;DR

This work addresses the need for faster, more accurate, and interpretable function approximation in neural networks by integrating fractional Jacobi basis functions into Kolmogorov-Arnold Networks (KANs). It introduces the fractional Jacobi neural block (fJNB) with trainable , , and , constraining these parameters via ELU and Sigmoid to enable a fractional degree in a stable range. Through extensive experiments on synthetic regression, MNIST, Fashion-MNIST denoising, IMDB sentiment analysis, and physics-informed ODE/PDE problems, fKAN with fractional Jacobi activations consistently improves training speed and performance relative to traditional activations and standard KANs. The work delivers a flexible, tunable activation framework with potential impact on deep learning and physics-informed modeling, while noting higher time costs and interpretability considerations, and suggests exploring local fractional bases such as fractional B-splines in future work.

Abstract

Recent advancements in neural network design have given rise to the development of Kolmogorov-Arnold Networks (KANs), which enhance speed, interpretability, and precision. This paper presents the Fractional Kolmogorov-Arnold Network (fKAN), a novel neural network architecture that incorporates the distinctive attributes of KANs with a trainable adaptive fractional-orthogonal Jacobi function as its basis function. By leveraging the unique mathematical properties of fractional Jacobi functions, including simple derivative formulas, non-polynomial behavior, and activity for both positive and negative input values, this approach ensures efficient learning and enhanced accuracy. The proposed architecture is evaluated across a range of tasks in deep learning and physics-informed deep learning. Precision is tested on synthetic regression data, image classification, image denoising, and sentiment analysis. Additionally, the performance is measured on various differential equations, including ordinary, partial, and fractional delay differential equations. The results demonstrate that integrating fractional Jacobi functions into KANs significantly improves training speed and performance across diverse fields and applications.
Paper Structure (17 sections, 6 theorems, 42 equations, 11 figures, 5 tables)

This paper contains 17 sections, 6 theorems, 42 equations, 11 figures, 5 tables.

Key Result

theorem thmcountertheorem

The Jacobi polynomials with $\alpha, \beta > -1$ are a set of orthogonal functions on the interval $[-1,1]$: where $\langle f,g\rangle = \int_\Omega f(\xi)g(\xi)w(\xi) \text{d}\xi$ is the inner product operator over the domain $\Omega \subseteq \mathbb{R}$shen2011spectral.

Figures (11)

  • Figure 1: This figure depicts the hierarchical classification of the generalized hypergeometric function. It shows that Chebyshev and Gegenbauer polynomials are particular cases within the more general class of Jacobi polynomials, which itself is a subclass of the hypergeometric function.
  • Figure 2: The architecture of the fractional Jacobi neural block proposed in formula \ref{['eq:JNB']}, featuring trainable parameters $\alpha$, $\beta$, and $\gamma$. This block serves as an adaptive activation function, enabling the network to learn optimal values for these parameters during training. The fJNB offers flexibility, ease of calculation, and adaptability, making it suitable for various applications in neural network architectures.
  • Figure 3: The results of simulation of a one-dimensional function regression task for different activation functions. The proposed activation function demonstrates higher accuracy than well-known functions as well as KAN.
  • Figure 4: The architecture of proposed method for MNIST classification data.
  • Figure 5: The loss function and accuracy of the proposed method in comparison to well-known activation functions for classifying the MNIST dataset.
  • ...and 6 more figures

Theorems & Definitions (7)

  • theorem thmcountertheorem
  • theorem thmcountertheorem
  • corollary thmcountercorollary
  • theorem thmcountertheorem
  • proof
  • theorem thmcountertheorem
  • theorem thmcountertheorem