Table of Contents
Fetching ...

Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement

Jonas A. Actor, Graham Harper, Ben Southworth, Eric C. Cyr

TL;DR

This paper addresses accelerating training of multichannel MLPs by leveraging Kolmogorov-Arnold Networks (KANs) and their spline-based activations. It develops a precise link between KANs and multichannel MLPs through a spline basis, showing that a change of basis acts as a preconditioner that improves optimization dynamics and affects the neural tangent kernel spectrum. The authors introduce multilevel geometric refinement and free-knot splines to rapidly propagate training across finer representations without disrupting progress, and demonstrate substantial accuracy and training-speed gains on regression benchmarks and a physics-informed neural network. The work provides a principled framework for applying spline-based activations to accelerate training in scientific machine learning, with implications for conditioning, spectral bias, and adaptive discretization.

Abstract

Multilayer perceptrons (MLPs) are a workhorse machine learning architecture, used in a variety of modern deep learning frameworks. However, recently Kolmogorov-Arnold Networks (KANs) have become increasingly popular due to their success on a range of problems, particularly for scientific machine learning tasks. In this paper, we exploit the relationship between KANs and multichannel MLPs to gain structural insight into how to train MLPs faster. We demonstrate the KAN basis (1) provides geometric localized support, and (2) acts as a preconditioned descent in the ReLU basis, overall resulting in expedited training and improved accuracy. Our results show the equivalence between free-knot spline KAN architectures, and a class of MLPs that are refined geometrically along the channel dimension of each weight tensor. We exploit this structural equivalence to define a hierarchical refinement scheme that dramatically accelerates training of the multi-channel MLP architecture. We show further accuracy improvements can be had by allowing the $1$D locations of the spline knots to be trained simultaneously with the weights. These advances are demonstrated on a range of benchmark examples for regression and scientific machine learning.

Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement

TL;DR

This paper addresses accelerating training of multichannel MLPs by leveraging Kolmogorov-Arnold Networks (KANs) and their spline-based activations. It develops a precise link between KANs and multichannel MLPs through a spline basis, showing that a change of basis acts as a preconditioner that improves optimization dynamics and affects the neural tangent kernel spectrum. The authors introduce multilevel geometric refinement and free-knot splines to rapidly propagate training across finer representations without disrupting progress, and demonstrate substantial accuracy and training-speed gains on regression benchmarks and a physics-informed neural network. The work provides a principled framework for applying spline-based activations to accelerate training in scientific machine learning, with implications for conditioning, spectral bias, and adaptive discretization.

Abstract

Multilayer perceptrons (MLPs) are a workhorse machine learning architecture, used in a variety of modern deep learning frameworks. However, recently Kolmogorov-Arnold Networks (KANs) have become increasingly popular due to their success on a range of problems, particularly for scientific machine learning tasks. In this paper, we exploit the relationship between KANs and multichannel MLPs to gain structural insight into how to train MLPs faster. We demonstrate the KAN basis (1) provides geometric localized support, and (2) acts as a preconditioned descent in the ReLU basis, overall resulting in expedited training and improved accuracy. Our results show the equivalence between free-knot spline KAN architectures, and a class of MLPs that are refined geometrically along the channel dimension of each weight tensor. We exploit this structural equivalence to define a hierarchical refinement scheme that dramatically accelerates training of the multi-channel MLP architecture. We show further accuracy improvements can be had by allowing the D locations of the spline knots to be trained simultaneously with the weights. These advances are demonstrated on a range of benchmark examples for regression and scientific machine learning.

Paper Structure

This paper contains 19 sections, 10 theorems, 59 equations, 3 figures, 3 tables.

Key Result

Lemma 2.3

Define the functions $b^{[1]}_i(x) = .$ Following deboor1978practical, recursively define Then, $B_S = \{ b^{[r-1]}_i(x) \}_{i=1-r:n-1}$ is a basis for $S_r(T)$.

Figures (3)

  • Figure 1: Example of the learned separating hyperplanes for three different architecture choices: (left) a standard MLP layer, where the nonlinearity is applied after the affine transformation, (center) a reordered MLP layer, where the nonlinearity is applied before the linear transformation, and (right) a standard KAN layer, where the nonlinearity is applied before the linear transformation.
  • Figure 2: Comparison of ReLU MLP hyperplanes against the KAN spline grid corresponding to two hidden neurons on the nonsmooth function regression example (top) and XOR problem (bottom).
  • Figure 3: Select convergence history for the nonsmooth and XOR regression examples under approximately same amount of work for all different models.

Theorems & Definitions (19)

  • Definition 2.1
  • Remark 2.2
  • Lemma 2.3: chui1988multivariate
  • Lemma 2.4: chui1988multivariate
  • Proposition 3.1
  • proof
  • Theorem 3.2
  • proof
  • Lemma 4.1
  • Remark 4.2
  • ...and 9 more