Table of Contents
Fetching ...

Multilevel Training for Kolmogorov Arnold Networks

Ben S. Southworth, Jonas A. Actor, Graham Harper, Eric C. Cyr

TL;DR

This paper establishes an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis, and develops multilevel algorithms that can dramatically improve training performance.

Abstract

Algorithmic speedup of training common neural architectures is made difficult by the lack of structure guaranteed by the function compositions inherent to such networks. In contrast to multilayer perceptrons (MLPs), Kolmogorov-Arnold networks (KANs) provide more structure by expanding learned activations in a specified basis. This paper exploits this structure to develop practical algorithms and theoretical insights, yielding training speedup via multilevel training for KANs. To do so, we first establish an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis. We then analyze how this change of basis affects the geometry of gradient-based optimization with respect to spline knots. The KANs change-of-basis motivates a multilevel training approach, where we train a sequence of KANs naturally defined through a uniform refinement of spline knots with analytic geometric interpolation operators between models. The interpolation scheme enables a ``properly nested hierarchy'' of architectures, ensuring that interpolation to a fine model preserves the progress made on coarse models, while the compact support of spline basis functions ensures complementary optimization on subsequent levels. Numerical experiments demonstrate that our multilevel training approach can achieve orders of magnitude improvement in accuracy over conventional methods to train comparable KANs or MLPs, particularly for physics informed neural networks. Finally, this work demonstrates how principled design of neural networks can lead to exploitable structure, and in this case, multilevel algorithms that can dramatically improve training performance.

Multilevel Training for Kolmogorov Arnold Networks

TL;DR

This paper establishes an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis, and develops multilevel algorithms that can dramatically improve training performance.

Abstract

Algorithmic speedup of training common neural architectures is made difficult by the lack of structure guaranteed by the function compositions inherent to such networks. In contrast to multilayer perceptrons (MLPs), Kolmogorov-Arnold networks (KANs) provide more structure by expanding learned activations in a specified basis. This paper exploits this structure to develop practical algorithms and theoretical insights, yielding training speedup via multilevel training for KANs. To do so, we first establish an equivalence between KANs with spline basis functions and multichannel MLPs with power ReLU activations through a linear change of basis. We then analyze how this change of basis affects the geometry of gradient-based optimization with respect to spline knots. The KANs change-of-basis motivates a multilevel training approach, where we train a sequence of KANs naturally defined through a uniform refinement of spline knots with analytic geometric interpolation operators between models. The interpolation scheme enables a ``properly nested hierarchy'' of architectures, ensuring that interpolation to a fine model preserves the progress made on coarse models, while the compact support of spline basis functions ensures complementary optimization on subsequent levels. Numerical experiments demonstrate that our multilevel training approach can achieve orders of magnitude improvement in accuracy over conventional methods to train comparable KANs or MLPs, particularly for physics informed neural networks. Finally, this work demonstrates how principled design of neural networks can lead to exploitable structure, and in this case, multilevel algorithms that can dramatically improve training performance.
Paper Structure (26 sections, 13 theorems, 51 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 13 theorems, 51 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Both $B_S^{[r]} = \{ b^{[r]}_i \}_{i=1-r}^{n-1}$ and $B_R^{[r]} = \{ \psi_i^{[r]} \}_{i=1-r}^{n-1}$ are bases for $S_r(T)$.

Figures (9)

  • Figure 1: Eigenvalues of $(A^{[r]})^TA^{[r]}$ ordered smallest to largest (left) and the first (center) and last (right) five corresponding eigenvectors for $n=50$ splines knots and example orders $r\in\{1,3\}$.
  • Figure 2: Comparison of eigenvalue and eigenvector properties across spline orders.
  • Figure 3: Speedup of evaluating a layer, by applying the ReLU$^r$ activation and then the change-of-basis matrix, compared to computing the Cox-de Boor recursive formula for spline functions. Error bars show 1 standard deviation, computed over 10 instances.
  • Figure 4: Select convergence history for regression under approximately same amount of work for all different models. Vertical lines indicate refinements for multilevel models.
  • Figure 5: For the 2d Possion problem in Sec. \ref{['sec:results:pinns:poisson']} (left) the volume, boundary and interface components of the PINN loss, and (right) multilevel KAN and MLP architectures and parameter counts. The neural network $u_N$ has an assumed dependence on parameters $\theta$.
  • ...and 4 more figures

Theorems & Definitions (23)

  • Lemma 1
  • Lemma 2
  • proof
  • Corollary 1: Uniform knots
  • proof
  • Lemma 3: Equivalence of KANs and multichannel MLPs
  • Lemma 4
  • proof
  • Lemma 5
  • Remark 1
  • ...and 13 more