Table of Contents
Fetching ...

Scientific Machine Learning with Kolmogorov-Arnold Networks

Salah A. Faroughi, Farinaz Mostajeran, Amin Hamed Mashhadzadeh, Shirko Faroughi

TL;DR

This review analyzes Kolmogorov–Arnol'd networks (KANs) as principled alternatives to traditional multilayer perceptrons in scientific machine learning. It synthesizes progress across data-driven, physics-informed, and deep-operator contexts, highlighting how KANs leverage Kolmogorov–Arnol'd representations to decompose high-dimensional mappings into univariate components, often improving interpretability, convergence, and spectral behavior. Key contributions include architectural innovations (two-layer and deep KANs), basis-function design (splines, Chebyshev, wavelets, RBFs), and comparative analyses showing favorable accuracy and efficiency versus MLPs and PINNs, as well as advances in DeepOKAN for operator learning. The work also identifies challenges (computational cost, hyperparameter sensitivity, framework support) and charts directions toward stronger theory, geometry-aware modeling, and industrial-scale validation, underscoring the potential of KAN-based models to deliver robust, mesh-independent, and physically consistent SciML solutions.

Abstract

The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation functions, and difficulty capturing localized or high-frequency features. KANs address these issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions and effectively overcoming the constraints associated with conventional MLP architectures. This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep-operator learning. Each perspective is examined through the lens of architectural design, training strategies, application efficacy, and comparative evaluation against MLP-based counterparts. By benchmarking KANs against MLPs, we highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively. In addition to reviewing recent literature, this work also presents several comparative evaluations that clarify central characteristics of KAN modeling and hint at their potential implications for real-world applications. Finally, this review identifies critical challenges and open research questions in KAN development, particularly regarding computational efficiency, theoretical guarantees, hyperparameter tuning, and algorithm complexity. We also outline future research directions aimed at improving the robustness, scalability, and physical consistency of KAN-based frameworks.

Scientific Machine Learning with Kolmogorov-Arnold Networks

TL;DR

This review analyzes Kolmogorov–Arnol'd networks (KANs) as principled alternatives to traditional multilayer perceptrons in scientific machine learning. It synthesizes progress across data-driven, physics-informed, and deep-operator contexts, highlighting how KANs leverage Kolmogorov–Arnol'd representations to decompose high-dimensional mappings into univariate components, often improving interpretability, convergence, and spectral behavior. Key contributions include architectural innovations (two-layer and deep KANs), basis-function design (splines, Chebyshev, wavelets, RBFs), and comparative analyses showing favorable accuracy and efficiency versus MLPs and PINNs, as well as advances in DeepOKAN for operator learning. The work also identifies challenges (computational cost, hyperparameter sensitivity, framework support) and charts directions toward stronger theory, geometry-aware modeling, and industrial-scale validation, underscoring the potential of KAN-based models to deliver robust, mesh-independent, and physically consistent SciML solutions.

Abstract

The field of scientific machine learning, which originally utilized multilayer perceptrons (MLPs), is increasingly adopting Kolmogorov-Arnold Networks (KANs) for data encoding. This shift is driven by the limitations of MLPs, including poor interpretability, fixed activation functions, and difficulty capturing localized or high-frequency features. KANs address these issues with enhanced interpretability and flexibility, enabling more efficient modeling of complex nonlinear interactions and effectively overcoming the constraints associated with conventional MLP architectures. This review categorizes recent progress in KAN-based models across three distinct perspectives: (i) data-driven learning, (ii) physics-informed modeling, and (iii) deep-operator learning. Each perspective is examined through the lens of architectural design, training strategies, application efficacy, and comparative evaluation against MLP-based counterparts. By benchmarking KANs against MLPs, we highlight consistent improvements in accuracy, convergence, and spectral representation, clarifying KANs' advantages in capturing complex dynamics while learning more effectively. In addition to reviewing recent literature, this work also presents several comparative evaluations that clarify central characteristics of KAN modeling and hint at their potential implications for real-world applications. Finally, this review identifies critical challenges and open research questions in KAN development, particularly regarding computational efficiency, theoretical guarantees, hyperparameter tuning, and algorithm complexity. We also outline future research directions aimed at improving the robustness, scalability, and physical consistency of KAN-based frameworks.

Paper Structure

This paper contains 33 sections, 24 equations, 18 figures, 8 tables.

Figures (18)

  • Figure 1: A schematic illustration highlighting the pivotal contributions in SciML that are established using MLPs or KANs. For MLP-based architectures, the following are listed: deep neural network (DNN) ivakhnenko1967cybernetics, physics-guided neural network (PgNN) andersen2002artificial, physics-informed neural network (PiNN) raissi2019physics, DeepONet lu2021learning, and Fourier neural operator Li2020FNO. For KAN-based architectures, the following are listed: Kolmogorov–Arnold representation theorem, Kolmogorov–Arnold networks, KART kolmogorov1957representationsArnold1958sprecher2002spacekoppen2002training, RBF-KAN li2024kolmogorov, ChebKAN ss2024chebyshev, KAN liu2024kan, Wav-KAN bozorgasl2405wav, DeepOKAN abueidda2025deepokan, PIKAN shukla2024comprehensivewang2024kolmogorov, FBKAN howard2024finite, KAN 2.0 liu2024kan2, Scaled-cPIKAN mostajeran2025scaled, KKAN toscano2025kkans.
  • Figure 2: A conceptual illustration of how MLPs and KANs work. MLPs use a fully interconnected structure that mixes all input dimensions in each layer; in contrast, KANs process each dimension with a univariate function, then combine the results via summation. This dimension-wise approach can offer a more direct path to representing high-dimensional functions. Here, $\boldsymbol{\xi}$ represents the input vector, $\boldsymbol{W}_{(l)}$ and $\boldsymbol{b}_{(l)}$ refer to the weights and biases of the $l$-th layer, respectively, $\sigma$ is a fixed activation function that can take different forms, e.g., ReLU, sigmoid, sin, tanh, faroughi2023physics, and $\boldsymbol{\Phi}_l$ is a KAN layer defined as a matrix of 1-dimensional functions.
  • Figure 3: Schematic architecture of a data-driven KAN model. The network receives input features and directly approximates the target output by learning a parameterized mapping $y_{_\text{KAN}}(\boldsymbol{\xi}; \boldsymbol{\theta})$ from data, without using physical governing equations. It employs a modified feed-forward architecture with learnable activation functions. The parameters $\boldsymbol{\theta}$ are optimized by minimizing a supervised loss, typically the mean squared error, using gradient-based methods, enabling the network to capture the underlying structure in the observed data.
  • Figure 4: (a) KAN with learnable B-spline activations for time series forecasting. The forecast is based on real GEO satellite traffic data (hourly resolution, 168h context / 24h prediction) provided by the 5G-STARDUST project, highlighting KAN's applicability to AI-driven satellite resource management. This panel is adopted from vaca2024kolmogorov. (b) Prediction of plastic shear strain $\gamma^p$ and mean effective stress $p$ using parallel and serial KAN architectures under an undrained loading path with $\xi = -4$, $p_{\text{in}} = 375$ kPa, and $e_{\text{in}} = 0.64$. The results are reproduced based on the method presented in mostajeran2024epi, along with the ground truth obtained from numerical integration. (c) Application of COEFF-KAN to electrolyte property prediction. Top: Parity plot comparing predicted and ground-truth values of logarithmic Coulombic Efficiency (LCE) across test samples, illustrating strong agreement and model generalization. Bottom: RMSE performance of KAN versus MLP across varying network depths and widths. This panel is reproduced from li2024coeff. (d) Channel-wise activation maps for explainability evaluation in tumor segmentation. Integrating KAN layers enhances spatial alignment between model attention and ground truth, improving plausibility in localization tasks. This panel is adopted from li2025u.
  • Figure 5: Performance comparison of cKAN and MLP architectures for data-driven modeling of sand elasto-plasticity.(a) Network architectures (serial and parallel). The models take as input a vector of stress $\boldsymbol{\sigma}$, strain $\boldsymbol{\varepsilon}$, plastic strain $\boldsymbol{\varepsilon}^{\text{p}}$, void ratio $e$, and the incremental strain $\Delta \boldsymbol{\varepsilon}\}$, and the output captures the corresponding changes in material response, including changes in void ratio $\Delta e$, incremental stress $\Delta \boldsymbol{\sigma}$, and incremental plastic strain $\Delta \boldsymbol{\varepsilon}^{\text{p}}\}$.(b) Prediction accuracy under a triaxial loading path defined by $\xi = -1000$, $p_{\text{in}} = 375$ kPa, and $e_{\text{in}} = 0.64$. The absolute errors in predicting the mean effective stress ($p$) and deviatoric stress ($q$) are assessed by comparing model outputs to reference results obtained from high-accuracy numerical integration.(c) Training convergence trends for serial and parallel architectures. The cKAN sub-networks use relatively compact architectures, with 2 layers of 20 neurons (degree 3) for the void ratio, and 3 layers of 20 neurons (degree 4) for both stress and plastic strain outputs. Moreover, the MLP counterparts employ deeper and wider networks, with 2 layers of 45 neurons for the void ratio and 5 layers of 35 neurons for stress and plastic strain, without polynomial basis functions. All panels present new analysis based on mostajeran2024epi.
  • ...and 13 more figures