Table of Contents
Fetching ...

Kolmogorov-Arnold Fourier Networks

Jusheng Zhang, Yijia Fan, Kaitong Cai, Keze Wang

TL;DR

KAN offers strong theoretical expressiveness but suffers from parameter explosion and poor high-frequency capture in high dimensions. KAF mitigates these issues by replacing B-spline bases with learnable Random Fourier Features and introducing a GELU-Fourier hybrid activation with adaptive spectral weighting, achieving parameter efficiency and improved spectral representation. The approach demonstrates superior or competitive performance across vision, NLP, audio, and PDE solving, often with fewer parameters and reasonable compute, while maintaining interpretability aspects inspired by Kolmogorov-Arnold theory. These results suggest KAF as a practical, scalable alternative to traditional KAN and MLP-based architectures in high-dimensional learning tasks.

Abstract

Although Kolmogorov-Arnold based interpretable networks (KAN) have strong theoretical expressiveness, they face significant parameter explosion and high-frequency feature capture challenges in high-dimensional tasks. To address this issue, we propose the Kolmogorov-Arnold-Fourier Network (KAF), which effectively integrates trainable Random Fourier Features (RFF) and a novel hybrid GELU-Fourier activation mechanism to balance parameter efficiency and spectral representation capabilities. Our key technical contributions include: (1) merging KAN's dual-matrix structure through matrix association properties to substantially reduce parameters; (2) introducing learnable RFF initialization strategies to eliminate spectral distortion in high-dimensional approximation tasks; (3) implementing an adaptive hybrid activation function that progressively enhances frequency representation during the training process. Comprehensive experiments demonstrate the superiority of our KAF across various domains including vision, NLP, audio processing, and differential equation-solving tasks, effectively combining theoretical interpretability with practical utility and computational efficiency.

Kolmogorov-Arnold Fourier Networks

TL;DR

KAN offers strong theoretical expressiveness but suffers from parameter explosion and poor high-frequency capture in high dimensions. KAF mitigates these issues by replacing B-spline bases with learnable Random Fourier Features and introducing a GELU-Fourier hybrid activation with adaptive spectral weighting, achieving parameter efficiency and improved spectral representation. The approach demonstrates superior or competitive performance across vision, NLP, audio, and PDE solving, often with fewer parameters and reasonable compute, while maintaining interpretability aspects inspired by Kolmogorov-Arnold theory. These results suggest KAF as a practical, scalable alternative to traditional KAN and MLP-based architectures in high-dimensional learning tasks.

Abstract

Although Kolmogorov-Arnold based interpretable networks (KAN) have strong theoretical expressiveness, they face significant parameter explosion and high-frequency feature capture challenges in high-dimensional tasks. To address this issue, we propose the Kolmogorov-Arnold-Fourier Network (KAF), which effectively integrates trainable Random Fourier Features (RFF) and a novel hybrid GELU-Fourier activation mechanism to balance parameter efficiency and spectral representation capabilities. Our key technical contributions include: (1) merging KAN's dual-matrix structure through matrix association properties to substantially reduce parameters; (2) introducing learnable RFF initialization strategies to eliminate spectral distortion in high-dimensional approximation tasks; (3) implementing an adaptive hybrid activation function that progressively enhances frequency representation during the training process. Comprehensive experiments demonstrate the superiority of our KAF across various domains including vision, NLP, audio processing, and differential equation-solving tasks, effectively combining theoretical interpretability with practical utility and computational efficiency.

Paper Structure

This paper contains 56 sections, 44 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Compare two models: a standard MLP with GELU activation (left) and a KAF with GELU activation (right). The MLP involves a projection matrix followed by GELU, while the KAF adds Random Fourier Features (RFF) and scale parameters, offering more flexibility in feature transformations.
  • Figure 2: Compare the performance of different models (KAN, MLP, GPKAN, FAN, KAF) across several datasets (MNIST, EMNIST, FMNIST, KMNIST, Cifar10, Cifar100, SVHN). The results show that KAF generally achieves higher accuracy with fewer parameters.
  • Figure 3: Compare the performance of various models (KAN, GPKAN, MLP, FAN, KAF) across NLP,audio and ML datasets. KAF consistently outperforms other models, achieving higher accuracy with fewer parameters, especially in datasets like Bean, Rice, and AG News. KAF's efficiency and accuracy make it a strong choice across a wide range of tasks.
  • Figure 4: This experiment compares different models (KAN, GPKAN, MLP, FAN, KAF) on various function approximation tasks, analyzing test RMSE versus the number of parameters. KAF consistently achieves lower RMSE across all tasks, outperforming other models like MLP with fewer parameters. Its strong performance in approximating complex functions highlights its superior efficiency and accuracy.
  • Figure 5: This experiment compares different models (MLP, KAN, KAF, FAN, GPKAN) in solving Poisson, 1D Wave, Heat, and Burgers equations. KAF consistently delivers strong performance across all tasks, demonstrating its efficiency and effectiveness in solving complex PDEs.
  • ...and 4 more figures