A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks

Khemraj Shukla; Juan Diego Toscano; Zhicheng Wang; Zongren Zou; George Em Karniadakis

A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks

Khemraj Shukla, Juan Diego Toscano, Zhicheng Wang, Zongren Zou, George Em Karniadakis

TL;DR

<3-5 sentence high-level summary> The paper conducts a comprehensive comparison between MLP-based PINNs/DeepONets and KAN-based representations (PIKAN, cPIKAN, and DeepOKAN) for solving forward and inverse differential equations and learning operators. It shows that vanilla KANs with B-splines can be slow and less accurate, while low-order polynomial KANs (notably Chebyshev-based variants) achieve competitive accuracy, with cPIKAN offering favorable parameter efficiency and robustness under certain conditions. The study also demonstrates that residual-based attention and entropy-viscosity stabilization can substantially improve performance for PDE problems, and analyzes training dynamics through information bottleneck theory, identifying fitting, diffusion, and total-diffusion stages. Overall, the work provides a strong FAIR benchmarking framework and highlights when KAN-based approaches can match or exceed traditional PINN/DeepONet performance, as well as directions for stability, scalability, and uncertainty quantification in SciML. The results have practical implications for selecting representation models in physics-informed learning and operator regression tasks across a range of PDEs and high-dimensional problems.

Abstract

Kolmogorov-Arnold Networks (KANs) were recently introduced as an alternative representation model to MLP. Herein, we employ KANs to construct physics-informed machine learning models (PIKANs) and deep operator models (DeepOKANs) for solving differential equations for forward and inverse problems. In particular, we compare them with physics-informed neural networks (PINNs) and deep operator networks (DeepONets), which are based on the standard MLP representation. We find that although the original KANs based on the B-splines parameterization lack accuracy and efficiency, modified versions based on low-order orthogonal polynomials have comparable performance to PINNs and DeepONet although they still lack robustness as they may diverge for different random seeds or higher order orthogonal polynomials. We visualize their corresponding loss landscapes and analyze their learning dynamics using information bottleneck theory. Our study follows the FAIR principles so that other researchers can use our benchmarks to further advance this emerging topic.

A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks

TL;DR

Abstract

Paper Structure (33 sections, 41 equations, 17 figures, 10 tables)

This paper contains 33 sections, 41 equations, 17 figures, 10 tables.

Introduction
Problem Formulation and representation models
Physics-informed neural networks (PINNs)
Residual-Based Attention
Neural operators (NOs)
Representation Models
Multilayer Perceptron (MLP)
Kolmogorov-Arnold networks (KANs)
Vanilla KAN (PIKAN)
Radial Basis Function(RBF) KANs
Wavelet KANs
Jacobi KANs
Computational experiments
Approximation of a discontinuous and oscillatory function
Structure preserving Dynamical System: Hamiltonian neural network (HNN) vs Hamiltonian Chebyshev-KAN (HcKAN)
...and 18 more sections

Figures (17)

Figure 1: An illustration of MLP and KAN for (a) differential equations and (b) operator networks. We choose DeepONet lu2021learning as the representation model for operator learning. Here activation function for MLP in PINNs and DeepONets is chosen as the hyperbolic tangent only for the demonstration.
Figure 2: Expressivity of Chebyshev-KAN while approximating the function \ref{['oscillatory_function']} is shown here. Subfigure (a) compares the reference and approximated functions, with the approximation by Chebyshev-KAN exhibiting a large error of 7.43%. Subfigure (b) depicts the trajectory of the training loss, noting that training becomes unstable after the 2000$^\text{th}$ iteration, leading to NaN loss values, which are represented using a very high value of order six. Subfigure (c) compares the spectra of the reference and approximated functions, highlighting Chebyshev-KAN's failure to capture the high frequencies, resulting in the significant error.
Figure 3: A comparison between reference and approximated of \ref{['oscillatory_function']} using (a) KAN-I, (b) KAN-II, (c) modified Chebyshev-KAN and (d) MLP based architectures.
Figure 4: A comparison between the spectrum of reference and approximated function obtained using (a) KAN-I, (b) KAN-II, (c) modified Chebyshev-KAN and (d) MLP architectures. Fourier spectra of approximated function obtained from all the four architectures are in very good agreement with the reference one. The long tail of oscillation represents the discontinuity present in the function \ref{['oscillatory_function']}.
Figure 5: Loss functions for function approximation
...and 12 more figures

A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks

TL;DR

Abstract

A comprehensive and FAIR comparison between MLP and KAN representations for differential equations and operator networks

Authors

TL;DR

Abstract

Table of Contents

Figures (17)