Table of Contents
Fetching ...

A Practitioner's Guide to Kolmogorov-Arnold Networks

Amir Noorizadegan, Sifan Wang, Leevan Ling, Juan P. Dominguez-Morales

TL;DR

<3-5 sentence high-level summary> Kolmogorov–Arnol'd Networks (KANs) offer a modular, edge-function-based alternative to traditional MLPs, grounded in Kolmogorov–Arnol'd theory and instantiated through learnable univariate basis functions. The paper surveys the historical development, formal equivalence to MLPs, and a wide spectrum of basis families (B-splines, Chebyshev, Jacobi, Gaussian/RBF, Fourier, wavelets, Sinc, etc.), detailing their computational trade-offs, stability considerations, and applicability to regression, PDE solving, and operator learning. It also catalogs accuracy, efficiency, regularization, and convergence results, and provides a practical Choose-Your-KAN guide, benchmarks, and a roadmap of current gaps. Collectively, the work positions KANs as a versatile framework that can outperform vanilla MLPs in structured settings while demanding principled design choices and careful numerical conditioning.

Abstract

The so-called Kolmogorov-Arnold Networks (KANs), whose design is merely inspired, rather than dictated, by the Kolmogorov superposition theorem, have emerged as a promising alternative to traditional Multilayer Perceptrons (MLPs). This review provides a systematic and comprehensive overview of the rapidly expanding KAN landscape. By collecting and categorizing a large set of open-source implementations, we map the vibrant ecosystem supporting modern KAN development. We organize the review around four core themes: (i) presenting a precise history of Kolmogorov's superposition theory toward neural-network formulations; (ii) establishing the formal equivalence between KANs and MLPs; (iii) analyzing the critical role of basis functions; and (iv) organizing recent advancements in accuracy, efficiency, regularization, and convergence. Finally, we provide a practical Choose-Your-KAN guide to assist practitioners in selecting appropriate architectures, and we close by identifying current research gaps and future directions. The associated GitHub repository (https://github.com/AmirNoori68/kan-review) complements this paper and serves as a structured reference for ongoing KAN research.

A Practitioner's Guide to Kolmogorov-Arnold Networks

TL;DR

<3-5 sentence high-level summary> Kolmogorov–Arnol'd Networks (KANs) offer a modular, edge-function-based alternative to traditional MLPs, grounded in Kolmogorov–Arnol'd theory and instantiated through learnable univariate basis functions. The paper surveys the historical development, formal equivalence to MLPs, and a wide spectrum of basis families (B-splines, Chebyshev, Jacobi, Gaussian/RBF, Fourier, wavelets, Sinc, etc.), detailing their computational trade-offs, stability considerations, and applicability to regression, PDE solving, and operator learning. It also catalogs accuracy, efficiency, regularization, and convergence results, and provides a practical Choose-Your-KAN guide, benchmarks, and a roadmap of current gaps. Collectively, the work positions KANs as a versatile framework that can outperform vanilla MLPs in structured settings while demanding principled design choices and careful numerical conditioning.

Abstract

The so-called Kolmogorov-Arnold Networks (KANs), whose design is merely inspired, rather than dictated, by the Kolmogorov superposition theorem, have emerged as a promising alternative to traditional Multilayer Perceptrons (MLPs). This review provides a systematic and comprehensive overview of the rapidly expanding KAN landscape. By collecting and categorizing a large set of open-source implementations, we map the vibrant ecosystem supporting modern KAN development. We organize the review around four core themes: (i) presenting a precise history of Kolmogorov's superposition theory toward neural-network formulations; (ii) establishing the formal equivalence between KANs and MLPs; (iii) analyzing the critical role of basis functions; and (iv) organizing recent advancements in accuracy, efficiency, regularization, and convergence. Finally, we provide a practical Choose-Your-KAN guide to assist practitioners in selecting appropriate architectures, and we close by identifying current research gaps and future directions. The associated GitHub repository (https://github.com/AmirNoori68/kan-review) complements this paper and serves as a structured reference for ongoing KAN research.

Paper Structure

This paper contains 47 sections, 97 equations, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Comparison of B-spline bases and synthesized univariate maps with/without grid extension. (a,b) Cubic ($k{=}3$) bases on non–extended vs. extended grids. (c,d) Learned maps $\varphi(x)$ using identical random coefficients $c_n$. Gray regions mark padded boundary intervals.
  • Figure 2: (a) Standard Chebyshev basis functions without per-layer $\tanh$ normalization. (b) Same basis with $\tanh$ normalization, showing compressed input range and moderated edge slopes. (c) Deep Chebyshev KAN map ($K{=}8$) comparing $\sum c_k T_k(\tanh x)$ (blue) vs. $\sum c_k T_k(x)$ (orange); the normalized version exhibits smoother behavior and smaller endpoint gradients, indicating improved conditioning.
  • Figure 3: (a) Step-by-step construction of the normalized ReLU-based local basis function from its constituent $\operatorname{ReLU}(e_i - x)$ and $\operatorname{ReLU}(x - s_i)$ terms. (b) Complete set of normalized ReLU–KAN basis functions $R_i(x)$ with supports $[s_i, e_i]$Qiu24.
  • Figure 4: Symbolic basis on $[s_{i^\star},e_{i^\star}]$: (a) $v_{2,i^\star}(x)$ with first and second derivatives; (b) $v_{4,i^\star}(x)$ with first and second derivatives. Observation: $v_{m,i}$ is globally $C^{m-1}$ (with the $m$-th derivative discontinuous at $s_{i^\star},e_{i^\star}$); hence $v_{4,i^\star}$ ($C^3$) offers smoother higher-order derivatives than $v_{2,i^\star}$ ($C^1$) KAN_pde_So24.
  • Figure 5: Synthesized activation $\varphi(x)$ from ReLU-KAN bases using shared random coefficients. Dashed: squared–ReLU basis ($m=2$), Solid: higher–order ReLU basis ($m=4$). Higher order yields sharper, smoother lobes while preserving the same coefficient structure.
  • ...and 14 more figures