Deep Learning Alternatives of the Kolmogorov Superposition Theorem

Leonardo Ferreira Guilhoto; Paris Perdikaris

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

Leonardo Ferreira Guilhoto, Paris Perdikaris

TL;DR

This work rethinks Kolmogorov–Arnold representations for neural networks by adopting a Laczkovich-based variant to build ActNet, a scalable architecture designed for PDEs and PINNs. ActNet introduces ActLayer as a trainable, multi-head inner-function unit that yields a universal approximator with fixed-depth universal guarantees and stable activations. In physics-informed experiments, ActNet consistently outperforms Kolmogorov–Arnold Networks and remains competitive with leading MLP-based PINNs, achieving state-of-the-art results on several challenging PDE benchmarks. The study highlights the practical viability of KST-inspired designs for scientific computing, while also acknowledging trade-offs in computational speed and outlining directions for future hardware-optimized implementations and broader data-driven applications.

Abstract

This paper explores alternative formulations of the Kolmogorov Superposition Theorem (KST) as a foundation for neural network design. The original KST formulation, while mathematically elegant, presents practical challenges due to its limited insight into the structure of inner and outer functions and the large number of unknown variables it introduces. Kolmogorov-Arnold Networks (KANs) leverage KST for function approximation, but they have faced scrutiny due to mixed results compared to traditional multilayer perceptrons (MLPs) and practical limitations imposed by the original KST formulation. To address these issues, we introduce ActNet, a scalable deep learning model that builds on the KST and overcomes many of the drawbacks of Kolmogorov's original formulation. We evaluate ActNet in the context of Physics-Informed Neural Networks (PINNs), a framework well-suited for leveraging KST's strengths in low-dimensional function approximation, particularly for simulating partial differential equations (PDEs). In this challenging setting, where models must learn latent functions without direct measurements, ActNet consistently outperforms KANs across multiple benchmarks and is competitive against the current best MLP-based approaches. These results present ActNet as a promising new direction for KST-based deep learning applications, particularly in scientific computing and PDE simulation tasks.

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

TL;DR

Abstract

Paper Structure (37 sections, 4 theorems, 33 equations, 14 figures, 9 tables)

This paper contains 37 sections, 4 theorems, 33 equations, 14 figures, 9 tables.

Introduction
Superposition Theorems For Representing Complex Functions
KST And Neural Networks.
Kolmogorov's Theorem Can Be Useful, Despite Its Limitations.
ActNet - A Kolmogorov Inspired Architecture
Theoretical Motivation
ActNet Formulation
Universality
Other Interpretations of the ActLayer
Choice of Basis functions and Initialization
Experiments
Ablations Studies
Poisson Equation.
Inhomogeneous Helmholtz Equation.
Allen-Cahn Equation
...and 22 more sections

Key Result

Theorem 3.1

Let $C(\mathbb{R}^d)$ denote the set of continuous functions from $\mathbb{R}^d\rightarrow \mathbb{R}$ and $m>(2+\sqrt{2})(2d-1)$ be an integer. There exists positive constants $\lambda_{ij} > 0$, $j=1,\dots,d$; $i=1,\dots, m$ and $m$ continuous increasing functions $\phi_i\in C(\mathbb{R})$, $i=1,\ Using vector notation, this can equivalently be written as where the $\phi_i$ are applied element-

Figures (14)

Figure 1: (left) Visual representation of an individual ActLayer. The ActLayer architecture can be seen as a MultiHead MLP layer with tunable activations. (right) Visual representation of the ActNet architecture. The input vector ${\bm{x}}$ is first projected to an embedding dimension, then passed into $L$ composed blocks of ActLayer, and finally linearly projected into the desired output dimension.
Figure 2: Example predictions for the Helmholtz equation using $w=16$. The relative L2 errors for the ActNet, Siren and KAN solutions above are 1.04e-03, 8.82e-2 and 2.64e-1, respectively.
Figure 3: ActNet predictions for the advection equation ($c=80$). The relative L2 error is 9.50e-5, whereas the best result found in the literature is 6.88e-4 wang2023expert-pinns.
Figure 4: ActNet predictions for the chaotic Kuramoto–Sivashinsky PDE. The relative L2 error is 8.53e-2, whereas the best result found in the literature is 1.61e-1 wang2023expert-pinns.
Figure 5: ActNet performance (relative L2 error) on the Allen-Cahn PDE under different hyperparameter settings. After selecting network depth $L$ and number of basis functions $N$, the width $m$ of networks was computed in order to satisfy the required parameter size. As such, for a given network size, larger values of deph imply smaller widths, and vice-versa. The values plotted for each hyperparameter configuration is the median from 3 runs using different seeds.
...and 9 more figures

Theorems & Definitions (8)

Theorem 3.1
Definition 3.2
Theorem 3.3
Theorem 3.4
proof : Proof of theorem \ref{['thm:actnet_universality']}
Lemma F.1
proof
proof : Proof of theorem \ref{['thm:actlayer-activation-stability']}

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

TL;DR

Abstract

Deep Learning Alternatives of the Kolmogorov Superposition Theorem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (8)