MatrixKAN: Parallelized Kolmogorov-Arnold Network
Cale Coffman, Lizhong Chen
TL;DR
MatrixKAN introduces a matrix-representation-based parallelization of B-spline activations in Kolmogorov-Arnold Networks (KANs) to overcome the Cox-De Boor recursion bottleneck. By expressing spline outputs as matrix products with a precomputable basis matrix ${\boldsymbol{\Psi}}^k$ and corresponding power bases, MatrixKAN replaces sequential recursion with parallelizable tensor operations, achieving theoretical and practical speedups. Theoretical analysis shows a shift from forward-pass complexity $O(N^2 L (k^2 + kG))$ with effective time $O(Lk)$ in KAN to $O(N^2 L (k^2 + G))$ with effective time $O(L)$ in MatrixKAN, yielding speedups that scale with spline degree. Empirical results on Feynman-equation modeling and Hellokan datasets confirm identical modeling performance between MatrixKAN and KAN while delivering up to ~40x speedups, with larger gains on bigger datasets or higher spline degrees, thereby enabling more expressive high-degree B-splines in scalable neural modeling.
Abstract
Kolmogorov-Arnold Networks (KAN) are a new class of neural network architecture representing a promising alternative to the Multilayer Perceptron (MLP), demonstrating improved expressiveness and interpretability. However, KANs suffer from slow training and inference speeds relative to MLPs due in part to the recursive nature of the underlying B-spline calculations. This issue is particularly apparent with respect to KANs utilizing high-degree B-splines, as the number of required non-parallelizable recursions is proportional to B-spline degree. We solve this issue by proposing MatrixKAN, a novel optimization that parallelizes B-spline calculations with matrix representation and operations, thus significantly improving effective computation time for models utilizing high-degree B-splines. In this paper, we demonstrate the superior scaling of MatrixKAN's computation time relative to B-spline degree. Further, our experiments demonstrate speedups of approximately 40x relative to KAN, with significant additional speedup potential for larger datasets or higher spline degrees.
