The Connection between Kriging and Large Neural Networks

Marius Marinescu

The Connection between Kriging and Large Neural Networks

Marius Marinescu

TL;DR

This paper investigates the relationship between Kriging, Gaussian Process Regression, and neural networks. It demonstrates that Kriging predictions coincide with the MAP estimate in GP regression, and that a single-hidden-layer MLP converges to a Gaussian process in the infinite-width limit with a kernel given by $K(\mathbf{x}, \mathbf{x}') = \mathbb{E}_{\mathbf{a}}[h(\mathbf{x}; \mathbf{a}) h(\mathbf{x}'; \mathbf{a})]$. It catalogs concrete kernels arising from common transfer functions (e.g., linear, squared exponential, arc-cosine) and discusses non-stationarity, while outlining extensions to deeper networks via NNGP kernels and training dynamics via NTK. The work offers a unified probabilistic-kernel viewpoint that blends spatial statistics with kernel methods and deep learning, enabling interpretable, uncertainty-aware, and spatially aware ML models.

Abstract

AI has impacted many disciplines and is nowadays ubiquitous. In particular, spatial statistics is in a pivotal moment where it will increasingly intertwine with AI. In this scenario, a relevant question is what relationship spatial statistics models have with machine learning (ML) models, if any. In particular, in this paper, we explore the connections between Kriging and neural networks. At first glance, they may appear unrelated. Kriging - and its ML counterpart, Gaussian process regression - are grounded in probability theory and stochastic processes, whereas many ML models are extensively considered Black-Box models. Nevertheless, they are strongly related. We study their connections and revisit the relevant literature. The understanding of their relations and the combination of both perspectives may enhance ML techniques by making them more interpretable, reliable, and spatially aware.

The Connection between Kriging and Large Neural Networks

TL;DR

. It catalogs concrete kernels arising from common transfer functions (e.g., linear, squared exponential, arc-cosine) and discusses non-stationarity, while outlining extensions to deeper networks via NNGP kernels and training dynamics via NTK. The work offers a unified probabilistic-kernel viewpoint that blends spatial statistics with kernel methods and deep learning, enabling interpretable, uncertainty-aware, and spatially aware ML models.

Abstract

Paper Structure (10 sections, 16 equations, 2 figures, 1 table)

This paper contains 10 sections, 16 equations, 2 figures, 1 table.

Introduction
Kriging and GPR historical background
Kriging
GPR
Kriging and GPR
Kriging as a multilayer perceptron
Some numerical demonstrations
Further connections
Linear kernel
Stationary squared exponential kernel

Figures (2)

Figure 1: A MLP architecture with one hidden layer and a transfer function $h$.
Figure 2: Samples from GP and NN. The length scale $\sigma$ is chosen to be 1.

The Connection between Kriging and Large Neural Networks

TL;DR

Abstract

The Connection between Kriging and Large Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (2)