Table of Contents
Fetching ...

GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions

Oscar Eliasson

TL;DR

GS-KAN tackles the parameter explosion of Kolmogorov-Arnold Networks by introducing a layer-wise shared, learnable B-spline basis with per-edge linear weights and translations. This generalizes Sprecher's refinement to a practical, gradient-based architecture that preserves edge-wise nonlinearity while achieving parameter counts comparable to MLPs. Empirically, GS-KAN excels at high-frequency function approximation, remains competitive with state-of-the-art KAN variants on tabular data, and outperforms MLPs on high-dimensional image-vector tasks within tight parameter budgets. The approach enables scalable, memory-efficient KAN-like models for diverse tasks, with clear paths for future enhancements in basis adaptivity and optimization.

Abstract

The Kolmogorov-Arnold representation theorem offers a theoretical alternative to Multi-Layer Perceptrons (MLPs) by placing learnable univariate functions on edges rather than nodes. While recent implementations such as Kolmogorov-Arnold Networks (KANs) demonstrate high approximation capabilities, they suffer from significant parameter inefficiency due to the requirement of maintaining unique parameterizations for every network edge. In this work, we propose GS-KAN (Generalized Sprecher-KAN), a lightweight architecture inspired by David Sprecher's refinement of the superposition theorem. GS-KAN constructs unique edge functions by applying learnable linear transformations to a single learnable, shared parent function per layer. We evaluate GS-KAN against existing KAN architectures and MLPs across synthetic function approximation, tabular data regression and image classification tasks. Our results demonstrate that GS-KAN outperforms both MLPs and standard KAN baselines on continuous function approximation tasks while maintaining superior parameter efficiency. Additionally, GS-KAN achieves competitive performance with existing KAN architectures on tabular regression and outperforms MLPs on high-dimensional classification tasks. Crucially, the proposed architecture enables the deployment of KAN-based architectures in high-dimensional regimes under strict parameter constraints, a setting where standard implementations are typically infeasible due to parameter explosion. The source code is available at https://github.com/rambamn48/gs-impl.

GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions

TL;DR

GS-KAN tackles the parameter explosion of Kolmogorov-Arnold Networks by introducing a layer-wise shared, learnable B-spline basis with per-edge linear weights and translations. This generalizes Sprecher's refinement to a practical, gradient-based architecture that preserves edge-wise nonlinearity while achieving parameter counts comparable to MLPs. Empirically, GS-KAN excels at high-frequency function approximation, remains competitive with state-of-the-art KAN variants on tabular data, and outperforms MLPs on high-dimensional image-vector tasks within tight parameter budgets. The approach enables scalable, memory-efficient KAN-like models for diverse tasks, with clear paths for future enhancements in basis adaptivity and optimization.

Abstract

The Kolmogorov-Arnold representation theorem offers a theoretical alternative to Multi-Layer Perceptrons (MLPs) by placing learnable univariate functions on edges rather than nodes. While recent implementations such as Kolmogorov-Arnold Networks (KANs) demonstrate high approximation capabilities, they suffer from significant parameter inefficiency due to the requirement of maintaining unique parameterizations for every network edge. In this work, we propose GS-KAN (Generalized Sprecher-KAN), a lightweight architecture inspired by David Sprecher's refinement of the superposition theorem. GS-KAN constructs unique edge functions by applying learnable linear transformations to a single learnable, shared parent function per layer. We evaluate GS-KAN against existing KAN architectures and MLPs across synthetic function approximation, tabular data regression and image classification tasks. Our results demonstrate that GS-KAN outperforms both MLPs and standard KAN baselines on continuous function approximation tasks while maintaining superior parameter efficiency. Additionally, GS-KAN achieves competitive performance with existing KAN architectures on tabular regression and outperforms MLPs on high-dimensional classification tasks. Crucially, the proposed architecture enables the deployment of KAN-based architectures in high-dimensional regimes under strict parameter constraints, a setting where standard implementations are typically infeasible due to parameter explosion. The source code is available at https://github.com/rambamn48/gs-impl.

Paper Structure

This paper contains 30 sections, 3 theorems, 5 equations, 3 tables.

Key Result

Theorem 1

Let $\sigma(\cdot)$ be a fixed, non-linear activation function. Any continuous function $f: \mathbb{R}^n \to \mathbb{R}$ can be approximated to arbitrary accuracy by a finite linear combination of the form: where $N$ is the number of hidden neurons, and $w_{ij}, v_i, b_i$ are learnable parameters.

Theorems & Definitions (3)

  • Theorem 1: Universal Approximation Theorem
  • Theorem 2: Kolmogorov-Arnold Representation
  • Theorem 3: Sprecher, 1965