Table of Contents
Fetching ...

On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators

Wei Liu, Eleni Chatzi, Zhilu Lai

TL;DR

The paper analyzes Kolmogorov–Arnol'd Networks (KANs) that use univariate B-spline components to model multivariate functions via additive or hybrid additive-m multiplicative structures. It proves that spline-based KAN estimators achieve the minimax convergence rate $O(n^{-\frac{2r}{2r+1}})$ for functions in Sobolev spaces $W^r([0,1]^d)$, with knot placement $k_n \asymp n^{1/(2r+1)}$. The results hold for both additive and hybrid KANs, establishing minimax optimality over the corresponding function classes and providing identifiability remarks. Simulations corroborate the theory, showing that KANs converge at the predicted rate and outperform standard MLP baselines, highlighting the value of structure and splines for interpretable, efficient nonparametric regression.

Abstract

Kolmogorov-Arnold Networks (KANs) offer a structured and interpretable framework for multivariate function approximation by composing univariate transformations through additive or multiplicative aggregation. This paper establishes theoretical convergence guarantees for KANs when the univariate components are represented by B-splines. We prove that both additive and hybrid additive-multiplicative KANs attain the minimax-optimal convergence rate $O(n^{-2r/(2r+1)})$ for functions in Sobolev spaces of smoothness $r$. We further derive guidelines for selecting the optimal number of knots in the B-splines. The theory is supported by simulation studies that confirm the predicted convergence rates. These results provide a theoretical foundation for using KANs in nonparametric regression and highlight their potential as a structured alternative to existing methods.

On the Rate of Convergence of Kolmogorov-Arnold Network Regression Estimators

TL;DR

The paper analyzes Kolmogorov–Arnol'd Networks (KANs) that use univariate B-spline components to model multivariate functions via additive or hybrid additive-m multiplicative structures. It proves that spline-based KAN estimators achieve the minimax convergence rate for functions in Sobolev spaces , with knot placement . The results hold for both additive and hybrid KANs, establishing minimax optimality over the corresponding function classes and providing identifiability remarks. Simulations corroborate the theory, showing that KANs converge at the predicted rate and outperform standard MLP baselines, highlighting the value of structure and splines for interpretable, efficient nonparametric regression.

Abstract

Kolmogorov-Arnold Networks (KANs) offer a structured and interpretable framework for multivariate function approximation by composing univariate transformations through additive or multiplicative aggregation. This paper establishes theoretical convergence guarantees for KANs when the univariate components are represented by B-splines. We prove that both additive and hybrid additive-multiplicative KANs attain the minimax-optimal convergence rate for functions in Sobolev spaces of smoothness . We further derive guidelines for selecting the optimal number of knots in the B-splines. The theory is supported by simulation studies that confirm the predicted convergence rates. These results provide a theoretical foundation for using KANs in nonparametric regression and highlight their potential as a structured alternative to existing methods.

Paper Structure

This paper contains 18 sections, 9 theorems, 78 equations, 1 figure, 1 table.

Key Result

Theorem 1

Let $\hat{f}_n$ be the spline–based KAN sieve estimator defined in Section sec:setup, and suppose that the true regression function $f_0$ is of the additive KAN form with $g_q, \psi_{qj} \in W^r([0,1])$ and $r > 1/2$, then

Figures (1)

  • Figure 1: Convergence rates of additive KAN, hybrid KAN, and MLP estimators on two synthetic targets. The dashed line indicates the theoretical slope $-4/5$. Left: piecewise polynomial function with exact Sobolev smoothness $r=2$. Right: periodic function with exact Sobolev smoothness $r=2$, constructed from a Fourier series. Mean squared error on a test set is plotted against sample size $n$ on a log--log scale.

Theorems & Definitions (17)

  • Theorem 1: Convergence rate of additive KAN with spline sieve
  • Remark 1
  • Proposition 1: Identifiability up to constant shifts
  • Remark 2
  • Theorem 2: Convergence rate of hybrid KAN with spline sieve
  • Remark 3
  • Corollary 1: Minimax optimality of KAN estimators
  • Corollary 2: Optimal knot number for spline-based KAN sieves
  • Remark 4
  • Theorem 1: Convergence rate of additive KAN with spline sieve
  • ...and 7 more