Table of Contents
Fetching ...

Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks

Nandi Schoots, Mattia Jacopo Villani, Niels uit de Bos

TL;DR

This work builds an explicit, bidirectional bridge between piecewise linear Kolmogorov-Arnold Networks (KANs) and ReLU networks. It provides constructive conversions in both directions, showing that any ReLU network can be represented as a KAN and vice versa, with quantified effects on depth, width, and nonzero parameters: ReLU→KAN introduces a linear parameter overhead $O(\sum_i n_i)$, while KAN→ReLU preserves parameter count aside from width expansion by a factor up to $k$, the number of segments per activation. The authors derive polyhedral-decomposition bounds for both architectures, establishing that KANs yield a finer partition than ReLUs for a given parameter budget, and prove that any piecewise linear function can be represented as a KAN. This bridge enables transferring ReLU-network theory to KANs (e.g., symmetries, initialisation, generalisation bounds) while enabling KAN interpretability via tractable polyhedral analyses and efficient inference through parameter-efficient representations.

Abstract

Kolmogorov-Arnold Networks are a new family of neural network architectures which holds promise for overcoming the curse of dimensionality and has interpretability benefits (arXiv:2404.19756). In this paper, we explore the connection between Kolmogorov Arnold Networks (KANs) with piecewise linear (univariate real) functions and ReLU networks. We provide completely explicit constructions to convert a piecewise linear KAN into a ReLU network and vice versa.

Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks

TL;DR

This work builds an explicit, bidirectional bridge between piecewise linear Kolmogorov-Arnold Networks (KANs) and ReLU networks. It provides constructive conversions in both directions, showing that any ReLU network can be represented as a KAN and vice versa, with quantified effects on depth, width, and nonzero parameters: ReLU→KAN introduces a linear parameter overhead , while KAN→ReLU preserves parameter count aside from width expansion by a factor up to , the number of segments per activation. The authors derive polyhedral-decomposition bounds for both architectures, establishing that KANs yield a finer partition than ReLUs for a given parameter budget, and prove that any piecewise linear function can be represented as a KAN. This bridge enables transferring ReLU-network theory to KANs (e.g., symmetries, initialisation, generalisation bounds) while enabling KAN interpretability via tractable polyhedral analyses and efficient inference through parameter-efficient representations.

Abstract

Kolmogorov-Arnold Networks are a new family of neural network architectures which holds promise for overcoming the curse of dimensionality and has interpretability benefits (arXiv:2404.19756). In this paper, we explore the connection between Kolmogorov Arnold Networks (KANs) with piecewise linear (univariate real) functions and ReLU networks. We provide completely explicit constructions to convert a piecewise linear KAN into a ReLU network and vice versa.

Paper Structure

This paper contains 18 sections, 13 theorems, 42 equations, 4 figures.

Key Result

Theorem 1

Let $g \colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a feedforward network with activation functions from a family $\mathcal{F}$. There exists a KAN $f\colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ with activation functions that are either affine linear or from $\mathcal{F}$ such that $f(x) = g(x)

Figures (4)

  • Figure 1: Example of a piecewise linear activation function.
  • Figure 3: Concatenating vectors $W^{(1)}_{i_1, i_{0}}$ and $B^{(1)}_{i_1, i_{0}}$ into vectors $W^{(1)}_{i_1}$ and $B^{(1)}_{i_1}$.
  • Figure 4: Three hidden layer network implementing a KAN of depth two.
  • Figure 5: Sum of two activation functions that each have one breakpoint at the origin. A 2-dimensional hyperplane cuts through the pyramid.

Theorems & Definitions (25)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • proof
  • ...and 15 more