Training Deep Physics-Informed Kolmogorov-Arnold Networks
Spyros Rigas, Fotios Anagnostopoulos, Michalis Papachristou, Georgios Alexandridis
TL;DR
This work tackles training instability in deep Chebyshev-based Kolmogorov–Arnold Networks (KANs) for physics-informed PDE learning. It introduces a basis-agnostic Glorot-like initialization to preserve activation and gradient variances and proposes Residual-Gated Adaptive KANs (RGA KANs) to stabilize deep training, analyzed through Information Bottleneck theory. Extensive experiments on nine forward PDE benchmarks, with ablations, show that RGA KANs consistently outperform parameter-matched cPIKANs and PirateNets, often by orders of magnitude, and avoid divergence in regimes where others fail. The findings establish a robust initialization-architecture combination and provide a practical, depth-scalable framework for PIML PDE solvers, with avenues for future expansion to other bases and operator-learning contexts.
Abstract
Since their introduction, Kolmogorov-Arnold Networks (KANs) have been successfully applied across several domains, with physics-informed machine learning (PIML) emerging as one of the areas where they have thrived. In the PIML setting, Chebyshev-based physics-informed KANs (cPIKANs) have become the standard due to their computational efficiency. However, like their multilayer perceptron-based counterparts, cPIKANs face significant challenges when scaled to depth, leading to training instabilities that limit their applicability to several PDE problems. To address this, we propose a basis-agnostic, Glorot-like initialization scheme that preserves activation variance and yields substantial improvements in stability and accuracy over the default initialization of cPIKANs. Inspired by the PirateNet architecture, we further introduce Residual-Gated Adaptive KANs (RGA KANs), designed to mitigate divergence in deep cPIKANs where initialization alone is not sufficient. Through empirical tests and information bottleneck analysis, we show that RGA KANs successfully traverse all training phases, unlike baseline cPIKANs, which stagnate in the diffusion phase in specific PDE settings. Evaluations on nine standard forward PDE benchmarks under a fixed training pipeline with adaptive components demonstrate that RGA KANs consistently outperform parameter-matched cPIKANs and PirateNets - often by several orders of magnitude - while remaining stable in settings where the others diverge.
