Table of Contents
Fetching ...

Gaussian Process Kolmogorov-Arnold Networks

Andrew Siyuan Chen

TL;DR

GP-KAN addresses the challenge of uncertainty propagation in deep Gaussian process architectures by introducing GP neurons as the univariate non-linear units in a Kolmogorov-Arnold Network. It achieves exact propagation of Gaussian distributions through depth by collapsing the input distribution via inner products with GP function samples, enabling analytic mean and variance calculations. The approach yields two practical layers, Fully-Connected GP (FCGP) and Convolutional GP (ConvGP), with normalization and activation strategies that preserve Gaussianity. Empirical validation on MNIST shows competitive accuracy with far fewer parameters than state-of-the-art methods, highlighting a potentially efficient and uncertainty-aware alternative for image classification and other domains.

Abstract

In this paper, we introduce a probabilistic extension to Kolmogorov Arnold Networks (KANs) by incorporating Gaussian Process (GP) as non-linear neurons, which we refer to as GP-KAN. A fully analytical approach to handling the output distribution of one GP as an input to another GP is achieved by considering the function inner product of a GP function sample with the input distribution. These GP neurons exhibit robust non-linear modelling capabilities while using few parameters and can be easily and fully integrated in a feed-forward network structure. They provide inherent uncertainty estimates to the model prediction and can be trained directly on the log-likelihood objective function, without needing variational lower bounds or approximations. In the context of MNIST classification, a model based on GP-KAN of 80 thousand parameters achieved 98.5% prediction accuracy, compared to current state-of-the-art models with 1.5 million parameters.

Gaussian Process Kolmogorov-Arnold Networks

TL;DR

GP-KAN addresses the challenge of uncertainty propagation in deep Gaussian process architectures by introducing GP neurons as the univariate non-linear units in a Kolmogorov-Arnold Network. It achieves exact propagation of Gaussian distributions through depth by collapsing the input distribution via inner products with GP function samples, enabling analytic mean and variance calculations. The approach yields two practical layers, Fully-Connected GP (FCGP) and Convolutional GP (ConvGP), with normalization and activation strategies that preserve Gaussianity. Empirical validation on MNIST shows competitive accuracy with far fewer parameters than state-of-the-art methods, highlighting a potentially efficient and uncertainty-aware alternative for image classification and other domains.

Abstract

In this paper, we introduce a probabilistic extension to Kolmogorov Arnold Networks (KANs) by incorporating Gaussian Process (GP) as non-linear neurons, which we refer to as GP-KAN. A fully analytical approach to handling the output distribution of one GP as an input to another GP is achieved by considering the function inner product of a GP function sample with the input distribution. These GP neurons exhibit robust non-linear modelling capabilities while using few parameters and can be easily and fully integrated in a feed-forward network structure. They provide inherent uncertainty estimates to the model prediction and can be trained directly on the log-likelihood objective function, without needing variational lower bounds or approximations. In the context of MNIST classification, a model based on GP-KAN of 80 thousand parameters achieved 98.5% prediction accuracy, compared to current state-of-the-art models with 1.5 million parameters.
Paper Structure (12 sections, 15 equations, 5 figures)

This paper contains 12 sections, 15 equations, 5 figures.

Figures (5)

  • Figure 1: Empirical vs analytical mean and variance values, repeated for different $\mu_x,\sigma_x^2$
  • Figure 2: (a) Experiment to model $y=\exp(\sin(\pi x_1)+x_2^2)$ using a 2-1 GP Neuron layout, where two GP Neurons consume $x_1$ and $x_2$ respectively, and their output is summed and consumed by another GP Neuron. Note that output of each GP Neuron is a Gaussian Distribution, and summation refers to summation of Gaussian-distributed variables following $\mathcal{N}(\mu_1,\sigma_1^2),\mathcal{N}(\mu_2,\sigma_2^2)$, giving $\mathcal{N}(\mu_1+\mu_2, \sigma_1^2 + \sigma_2^2)$. The red points represent the GP inducing points. (b) Training progress
  • Figure 3: Convolutional GP (ConvGP) Layer
  • Figure 4: (a) GP-KAN model structure used for MNIST classification. (b) Validation set accuracy and (c) Training negative log-likelihood over the course of training, for models of different parameter count, which is varied by changing the kernel, channel and stride values
  • Figure 5: Example of MNIST model predictions, with (a) showing the successful predictions and (b) showing a failed prediction. ArgMax is applied on the raw model output to get the index corresponding to the label. In the case of the failed prediction, the raw output values for the predicted index and the index corresponding to the true label are very close.