Table of Contents
Fetching ...

Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs

Salah A. Faroughi, Farinaz Mostajeran

TL;DR

The paper analyzes training dynamics of Chebyshev-based physics-informed Kolmogorov–Arnol'd networks (cPIKANs) through Neural Tangent Kernel (NTK) theory. It derives the NTK for cKANs and extends to cPIKANs, examining spectral properties, kernel drift, and how domain scaling and subdomain decomposition affect convergence across multiple PDEs. Across diffusion, Helmholtz, Allen–Cahn, and a forced vibration PDE, cPIKANs show well-conditioned NTKs, reduced spectral bias, and faster, more stable learning than PINNs, with domain decomposition further enhancing performance. These findings provide theoretical insight and practical guidance for designing robust, efficient physics-informed solvers for multi-scale PDEs.

Abstract

Physics-informed Kolmogorov-Arnold Networks (PIKANs), and in particular their Chebyshev-based variants (cPIKANs), have recently emerged as promising models for solving partial differential equations (PDEs). However, their training dynamics and convergence behavior remain largely unexplored both theoretically and numerically. In this work, we aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory. Our objective is to discern the evolution of kernel structure throughout gradient-based training and its subsequent impact on learning efficiency. We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context. We analyze the spectral properties of NTK matrices, specifically their eigenvalue distributions and spectral bias, for four representative PDEs: the steady-state Helmholtz equation, transient diffusion and Allen-Cahn equations, and forced vibrations governed by the Euler-Bernoulli beam equation. We also conduct an investigation into the impact of various optimization strategies, e.g., first-order, second-order, and hybrid approaches, on the evolution of the NTK and the resulting learning dynamics. Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture. Spectral trends also reveal when domain decomposition improves training, directly linking kernel behavior to convergence rates under different setups. To the best of our knowledge, this is the first systematic NTK study of cPIKANs, providing theoretical insight that clarifies and predicts their empirical performance.

Neural Tangent Kernel Analysis to Probe Convergence in Physics-informed Neural Solvers: PIKANs vs. PINNs

TL;DR

The paper analyzes training dynamics of Chebyshev-based physics-informed Kolmogorov–Arnol'd networks (cPIKANs) through Neural Tangent Kernel (NTK) theory. It derives the NTK for cKANs and extends to cPIKANs, examining spectral properties, kernel drift, and how domain scaling and subdomain decomposition affect convergence across multiple PDEs. Across diffusion, Helmholtz, Allen–Cahn, and a forced vibration PDE, cPIKANs show well-conditioned NTKs, reduced spectral bias, and faster, more stable learning than PINNs, with domain decomposition further enhancing performance. These findings provide theoretical insight and practical guidance for designing robust, efficient physics-informed solvers for multi-scale PDEs.

Abstract

Physics-informed Kolmogorov-Arnold Networks (PIKANs), and in particular their Chebyshev-based variants (cPIKANs), have recently emerged as promising models for solving partial differential equations (PDEs). However, their training dynamics and convergence behavior remain largely unexplored both theoretically and numerically. In this work, we aim to advance the theoretical understanding of cPIKANs by analyzing them using Neural Tangent Kernel (NTK) theory. Our objective is to discern the evolution of kernel structure throughout gradient-based training and its subsequent impact on learning efficiency. We first derive the NTK of standard cKANs in a supervised setting, and then extend the analysis to the physics-informed context. We analyze the spectral properties of NTK matrices, specifically their eigenvalue distributions and spectral bias, for four representative PDEs: the steady-state Helmholtz equation, transient diffusion and Allen-Cahn equations, and forced vibrations governed by the Euler-Bernoulli beam equation. We also conduct an investigation into the impact of various optimization strategies, e.g., first-order, second-order, and hybrid approaches, on the evolution of the NTK and the resulting learning dynamics. Results indicate a tractable behavior for NTK in the context of cPIKANs, which exposes learning dynamics that standard physics-informed neural networks (PINNs) cannot capture. Spectral trends also reveal when domain decomposition improves training, directly linking kernel behavior to convergence rates under different setups. To the best of our knowledge, this is the first systematic NTK study of cPIKANs, providing theoretical insight that clarifies and predicts their empirical performance.

Paper Structure

This paper contains 18 sections, 3 theorems, 76 equations, 12 figures, 4 tables.

Key Result

Theorem 1

Let $f(\boldsymbol{x}; \boldsymbol{\theta})$ denote the output of a cKAN with one hidden layer of width $N$, where all coefficients are initialized as independent standard normal random variables. The expected NTK between two inputs $\boldsymbol{x}$ and $\boldsymbol{x}'$ is given by, where $T_n(\cdot)$ denotes the Chebyshev polynomial of degree $n$, and $\tilde{x}_i = \tanh(x_i)$ represents the t

Figures (12)

  • Figure 1: Comparison of the predicted solutions for the diffusion equation in Experiment \ref{['Exam.DiffEqu']}, demonstrating that cPIKAN yields significantly higher accuracy, with a maximum absolute error of only $7.52 \times 10^{-3}$, compared to PINN-I and PINN-II, whose errors exceed $2.5 \times 10^{-2}$. cPIKAN also exhibits faster convergence, lower final relative $\mathcal{L}^2$ error, and more stable training dynamics. These findings highlight the superior performance of cPIKAN in both predictive accuracy and optimization efficiency for solving the diffusion problem.
  • Figure 2: Evolution of the NTK eigenvalue spectra during training for the diffusion equation in Experiment \ref{['Exam.DiffEqu']}. The NTK spectrum in cPIKAN gradually converges during training, indicating stable learning dynamics and effective optimization. In contrast, the spectra for PINN-I and PINN-II remain dispersed, suggesting unstable behavior and poor information flow. These differences highlight the improved convergence and robustness of the cPIKAN model.
  • Figure 3: Comparison of the predicted solutions for the Helmholtz equation in Experiment \ref{['Exam.HelmHEqu']}, showing that cPIKAN produces highly accurate predictions with minimal absolute error, closely matching the ground truth. PINN captures the overall structure but significantly misestimates the amplitude, while bPIKAN fails to approximate the solution correctly. The training curves further confirm these findings. cPIKAN achieves the lowest relative $\mathcal{L}^2$ error and loss with stable convergence, whereas PINN converges more slowly with higher error, and bPIKAN shows poor learning behavior and fails to converge effectively.
  • Figure 4: Evolution of the NTK eigenvalue spectra during training for the Helmholtz equation in Experiment \ref{['Exam.HelmHEqu']}. The spectra for cPIKAN remain stable and gradually converge throughout training, reflecting well-conditioned dynamics and supporting its strong predictive performance. PINN shows delayed spectral stabilization, consistent with its slower convergence and moderate accuracy. In contrast, bPIKAN exhibits disordered and non-converging spectra, indicating unstable learning and poor approximation capability.
  • Figure 5: Comparison of predicted solutions for the Helmholtz equation using different optimization strategies in Experiment \ref{['Exam.HelmHEqu']}. It is shown that combining ADAM with LBFGS leads to the most accurate and stable solution, achieving the lowest absolute and relative errors. LBFGS alone also performs well, while ADAM alone results in higher errors, especially near the domain center. Training curves confirm these observations. LBFGS converges fastest and most smoothly, followed by ADAM+LBFGS, whereas ADAM shows slower convergence and unstable loss behavior. These results highlight the importance of optimizer choice for achieving reliable and accurate training in cPIKAN.
  • ...and 7 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Remark 1
  • Theorem 2
  • Remark 2
  • Lemma 1