Kolmogorov GAM Networks are all you need!

Sarah Polson; Vadim Sokolov

Kolmogorov GAM Networks are all you need!

Sarah Polson, Vadim Sokolov

TL;DR

This work proposes Kolmogorov generalized additive models (K-GAM), an architecture that uses a fixed Köppen-based embedding to map inputs into a fixed, function-agnostic feature space, followed by a trainable outer function $g$ to predict outputs. Grounded in Kolmogorov's representation $f(x)=\sum_{q=1}^{2n+1} g_q\left(\sum_{i=1}^n \psi_{q,i}(x_i)\right)$, K-GAM yields a two-layer model with efficiency advantages and a closer connection to additive models than to dense deep nets, while enabling uncertainty quantification. The paper formalizes a theorem showing any function can be represented as a GAM with fixed features and a shared outer regressor, discusses p-adic embeddings and inference strategies, and positions transformers as kernel-smoothing systems for comparison. Empirical illustrations on simulated data and the Iris dataset demonstrate the feasibility and interpretability trade-offs of K-GAM relative to traditional GAMs, highlighting potential for scalable, parameter-efficient function approximation with applications beyond standard deep learning. Overall, K-GAM offers a mathematically grounded, efficient alternative to transformer-style architectures with practical implications for scalable AI and uncertainty-aware learning.

Abstract

Kolmogorov GAM (K-GAM) networks are shown to be an efficient architecture for training and inference. They are an additive model with an embedding that is independent of the function of interest. They provide an alternative to the transformer architecture. They are the machine learning version of Kolmogorov's Superposition Theorem (KST) which provides an efficient representations of a multivariate function. Such representations have use in machine learning for encoding dictionaries (a.k.a. "look-up" tables). KST theory also provides a representation based on translates of the Köppen function. The goal of our paper is to interpret this representation in a machine learning context for applications in Artificial Intelligence (AI). Our architecture is equivalent to a topological embedding which is independent of the function together with an additive layer that uses a Generalized Additive Model (GAM). This provides a class of learning procedures with far fewer parameters than current deep learning algorithms. Implementation can be parallelizable which makes our algorithms computationally attractive. To illustrate our methodology, we use the Iris data from statistical learning. We also show that our additive model with non-linear embedding provides an alternative to transformer architectures which from a statistical viewpoint are kernel smoothers. Additive KAN models therefore provide a natural alternative to transformers. Finally, we conclude with directions for future research.

Kolmogorov GAM Networks are all you need!

TL;DR

to predict outputs. Grounded in Kolmogorov's representation

, K-GAM yields a two-layer model with efficiency advantages and a closer connection to additive models than to dense deep nets, while enabling uncertainty quantification. The paper formalizes a theorem showing any function can be represented as a GAM with fixed features and a shared outer regressor, discusses p-adic embeddings and inference strategies, and positions transformers as kernel-smoothing systems for comparison. Empirical illustrations on simulated data and the Iris dataset demonstrate the feasibility and interpretability trade-offs of K-GAM relative to traditional GAMs, highlighting potential for scalable, parameter-efficient function approximation with applications beyond standard deep learning. Overall, K-GAM offers a mathematically grounded, efficient alternative to transformer-style architectures with practical implications for scalable AI and uncertainty-aware learning.

Abstract

Paper Structure (20 sections, 52 equations, 7 figures, 3 tables)

This paper contains 20 sections, 52 equations, 7 figures, 3 tables.

Introduction
Kolmogorov Superposition Theorem (KST)
Inner and Outer Functions
Ridge and Projection Pursuit Regression
Kolmogorov-Arnold Networks
Kolmogorov Generalized Additive Models (K-GAM)
Theorem (K-GAM)
Proof
Note
p-adic Neural Networks and Embeddings
Inference
Brillinger
Kernel Smoothing: Interpolation
Training Rates
Transformers as Kernel Smoothing
...and 5 more sections

Figures (7)

Figure 1: Köppen function $\psi_k$ for $k=3,4,5$, $\gamma=10$
Figure 2: Scatter plot of the simulated dataset
Figure 3: KST architecture for the simulated dataset
Figure 4: Examples of outer functions $g_0,g_2,g_6,g_8$ for the simulated dataset
Figure 5: Plot of the single outer function $g$ for the simulated dataset
...and 2 more figures

Kolmogorov GAM Networks are all you need!

TL;DR

Abstract

Kolmogorov GAM Networks are all you need!

Authors

TL;DR

Abstract

Table of Contents

Figures (7)