Table of Contents
Fetching ...

Training Deep Physics-Informed Kolmogorov-Arnold Networks

Spyros Rigas, Fotios Anagnostopoulos, Michalis Papachristou, Georgios Alexandridis

TL;DR

This work tackles training instability in deep Chebyshev-based Kolmogorov–Arnold Networks (KANs) for physics-informed PDE learning. It introduces a basis-agnostic Glorot-like initialization to preserve activation and gradient variances and proposes Residual-Gated Adaptive KANs (RGA KANs) to stabilize deep training, analyzed through Information Bottleneck theory. Extensive experiments on nine forward PDE benchmarks, with ablations, show that RGA KANs consistently outperform parameter-matched cPIKANs and PirateNets, often by orders of magnitude, and avoid divergence in regimes where others fail. The findings establish a robust initialization-architecture combination and provide a practical, depth-scalable framework for PIML PDE solvers, with avenues for future expansion to other bases and operator-learning contexts.

Abstract

Since their introduction, Kolmogorov-Arnold Networks (KANs) have been successfully applied across several domains, with physics-informed machine learning (PIML) emerging as one of the areas where they have thrived. In the PIML setting, Chebyshev-based physics-informed KANs (cPIKANs) have become the standard due to their computational efficiency. However, like their multilayer perceptron-based counterparts, cPIKANs face significant challenges when scaled to depth, leading to training instabilities that limit their applicability to several PDE problems. To address this, we propose a basis-agnostic, Glorot-like initialization scheme that preserves activation variance and yields substantial improvements in stability and accuracy over the default initialization of cPIKANs. Inspired by the PirateNet architecture, we further introduce Residual-Gated Adaptive KANs (RGA KANs), designed to mitigate divergence in deep cPIKANs where initialization alone is not sufficient. Through empirical tests and information bottleneck analysis, we show that RGA KANs successfully traverse all training phases, unlike baseline cPIKANs, which stagnate in the diffusion phase in specific PDE settings. Evaluations on nine standard forward PDE benchmarks under a fixed training pipeline with adaptive components demonstrate that RGA KANs consistently outperform parameter-matched cPIKANs and PirateNets - often by several orders of magnitude - while remaining stable in settings where the others diverge.

Training Deep Physics-Informed Kolmogorov-Arnold Networks

TL;DR

This work tackles training instability in deep Chebyshev-based Kolmogorov–Arnold Networks (KANs) for physics-informed PDE learning. It introduces a basis-agnostic Glorot-like initialization to preserve activation and gradient variances and proposes Residual-Gated Adaptive KANs (RGA KANs) to stabilize deep training, analyzed through Information Bottleneck theory. Extensive experiments on nine forward PDE benchmarks, with ablations, show that RGA KANs consistently outperform parameter-matched cPIKANs and PirateNets, often by orders of magnitude, and avoid divergence in regimes where others fail. The findings establish a robust initialization-architecture combination and provide a practical, depth-scalable framework for PIML PDE solvers, with avenues for future expansion to other bases and operator-learning contexts.

Abstract

Since their introduction, Kolmogorov-Arnold Networks (KANs) have been successfully applied across several domains, with physics-informed machine learning (PIML) emerging as one of the areas where they have thrived. In the PIML setting, Chebyshev-based physics-informed KANs (cPIKANs) have become the standard due to their computational efficiency. However, like their multilayer perceptron-based counterparts, cPIKANs face significant challenges when scaled to depth, leading to training instabilities that limit their applicability to several PDE problems. To address this, we propose a basis-agnostic, Glorot-like initialization scheme that preserves activation variance and yields substantial improvements in stability and accuracy over the default initialization of cPIKANs. Inspired by the PirateNet architecture, we further introduce Residual-Gated Adaptive KANs (RGA KANs), designed to mitigate divergence in deep cPIKANs where initialization alone is not sufficient. Through empirical tests and information bottleneck analysis, we show that RGA KANs successfully traverse all training phases, unlike baseline cPIKANs, which stagnate in the diffusion phase in specific PDE settings. Evaluations on nine standard forward PDE benchmarks under a fixed training pipeline with adaptive components demonstrate that RGA KANs consistently outperform parameter-matched cPIKANs and PirateNets - often by several orders of magnitude - while remaining stable in settings where the others diverge.

Paper Structure

This paper contains 57 sections, 112 equations, 21 figures, 30 tables.

Figures (21)

  • Figure 1: Relative comparison of proposed and default initialization across the five benchmark functions. Each subplot corresponds to one function, with the color scale indicating the percentage improvement of the proposed initialization over the default in terms of the final $L^2$ error. Black cells denote configurations where the default initialization attains lower error.
  • Figure 2: Loss throughout training for two representative architectures (top row: three 4-dimensional hidden layers; bottom row: five 16-dimensional hidden layers) across the five benchmark functions. Each subplot shows the mean training loss over five independent runs (solid lines) together with the SEM (shaded area). The final column, corresponding to $f_5$, is shown without logarithmic scaling on the $y$-axis, since the loss did not exhibit significant improvement during training.
  • Figure 3: Comparison of default and proposed initialization schemes on Burgers' (top row) and Helmholtz (bottom row) equations. Left column: heatmaps of relative improvement in final $L^2$ error. Middle/Right column: training-loss curves per initialization scheme for a representative architecture of depth = 3/5 and width = 4/16, respectively. Shaded regions denote the SEM across five runs.
  • Figure 4: Relative $L^2$ error across increasing network depths for Burgers' (top row) and Allen--Cahn (bottom row) equations, under both default and proposed initialization schemes. Each column corresponds to a different network width (8, 16, 32). Solid lines show mean values over five random seeds, while shaded areas represent the SEM.
  • Figure 5: Schematic of the proposed RGA KAN architecture. Periodic boundary conditions, when present, are enforced directly through the BC Embedding layer. The embedded inputs are then passed through a sine-based KAN layer, whose outputs are split into three branches: two feeding Chebyshev-based KAN layers and one entering the first RGA block. Within each RGA block, the three signals are combined through gating operators and routed through adaptive skip connections, which dynamically modulate the effective network depth during training. Multiple RGA blocks can be stacked sequentially. The final output is produced by a physics-informed KAN layer, which incorporates prior information from the initial condition(s) when available.
  • ...and 16 more figures