Table of Contents
Fetching ...

MatrixKAN: Parallelized Kolmogorov-Arnold Network

Cale Coffman, Lizhong Chen

TL;DR

MatrixKAN introduces a matrix-representation-based parallelization of B-spline activations in Kolmogorov-Arnold Networks (KANs) to overcome the Cox-De Boor recursion bottleneck. By expressing spline outputs as matrix products with a precomputable basis matrix ${\boldsymbol{\Psi}}^k$ and corresponding power bases, MatrixKAN replaces sequential recursion with parallelizable tensor operations, achieving theoretical and practical speedups. Theoretical analysis shows a shift from forward-pass complexity $O(N^2 L (k^2 + kG))$ with effective time $O(Lk)$ in KAN to $O(N^2 L (k^2 + G))$ with effective time $O(L)$ in MatrixKAN, yielding speedups that scale with spline degree. Empirical results on Feynman-equation modeling and Hellokan datasets confirm identical modeling performance between MatrixKAN and KAN while delivering up to ~40x speedups, with larger gains on bigger datasets or higher spline degrees, thereby enabling more expressive high-degree B-splines in scalable neural modeling.

Abstract

Kolmogorov-Arnold Networks (KAN) are a new class of neural network architecture representing a promising alternative to the Multilayer Perceptron (MLP), demonstrating improved expressiveness and interpretability. However, KANs suffer from slow training and inference speeds relative to MLPs due in part to the recursive nature of the underlying B-spline calculations. This issue is particularly apparent with respect to KANs utilizing high-degree B-splines, as the number of required non-parallelizable recursions is proportional to B-spline degree. We solve this issue by proposing MatrixKAN, a novel optimization that parallelizes B-spline calculations with matrix representation and operations, thus significantly improving effective computation time for models utilizing high-degree B-splines. In this paper, we demonstrate the superior scaling of MatrixKAN's computation time relative to B-spline degree. Further, our experiments demonstrate speedups of approximately 40x relative to KAN, with significant additional speedup potential for larger datasets or higher spline degrees.

MatrixKAN: Parallelized Kolmogorov-Arnold Network

TL;DR

MatrixKAN introduces a matrix-representation-based parallelization of B-spline activations in Kolmogorov-Arnold Networks (KANs) to overcome the Cox-De Boor recursion bottleneck. By expressing spline outputs as matrix products with a precomputable basis matrix and corresponding power bases, MatrixKAN replaces sequential recursion with parallelizable tensor operations, achieving theoretical and practical speedups. Theoretical analysis shows a shift from forward-pass complexity with effective time in KAN to with effective time in MatrixKAN, yielding speedups that scale with spline degree. Empirical results on Feynman-equation modeling and Hellokan datasets confirm identical modeling performance between MatrixKAN and KAN while delivering up to ~40x speedups, with larger gains on bigger datasets or higher spline degrees, thereby enabling more expressive high-degree B-splines in scalable neural modeling.

Abstract

Kolmogorov-Arnold Networks (KAN) are a new class of neural network architecture representing a promising alternative to the Multilayer Perceptron (MLP), demonstrating improved expressiveness and interpretability. However, KANs suffer from slow training and inference speeds relative to MLPs due in part to the recursive nature of the underlying B-spline calculations. This issue is particularly apparent with respect to KANs utilizing high-degree B-splines, as the number of required non-parallelizable recursions is proportional to B-spline degree. We solve this issue by proposing MatrixKAN, a novel optimization that parallelizes B-spline calculations with matrix representation and operations, thus significantly improving effective computation time for models utilizing high-degree B-splines. In this paper, we demonstrate the superior scaling of MatrixKAN's computation time relative to B-spline degree. Further, our experiments demonstrate speedups of approximately 40x relative to KAN, with significant additional speedup potential for larger datasets or higher spline degrees.

Paper Structure

This paper contains 25 sections, 14 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Diagram of a cubic B-spline curve, including applicable knots (turquoise) and control points (red).
  • Figure 2: Diagram of cubic B-spline basis function curves (various colors) and knots (turquoise).
  • Figure 3: A plot comparing MatrixKAN and KAN models of shape [2,2,1] and [2,5,1] across increasing grid sizes.
  • Figure 4: A plot comparing MatrixKAN and KAN models of shape [2,2,1] and [2,5,1] across increasing spline degrees.
  • Figure 5: A plot comparing loss level of MatrixKAN and KAN models trained to model Feynman Equation I.6.20b: $f(\Theta, \sigma)=exp(-\frac{\Theta^2}{2\sigma^2})/\sqrt{2\pi\sigma^2}$.
  • ...and 2 more figures