An Empirical Analysis of the Laplace and Neural Tangent Kernels

Ronaldas Paulius Lencevičius

An Empirical Analysis of the Laplace and Neural Tangent Kernels

Ronaldas Paulius Lencevičius

TL;DR

This work empirically analyzes the practical equivalence between the Laplace kernel and the neural tangent kernel (NTK) in Gaussian process regression, emphasizing the unit-sphere domain $\mathbb{S}^{d-1}$ where their RKHS coincide. It demonstrates that exact kernel matching and posterior matching depend on jointly tuning the NTK depth $D$ and bias $\beta$ against the Laplace length-scale $\ell$, with deeper networks requiring smaller $\beta$ and $\ell$ for alignment. The study finds strong posterior-mean equivalence on $\mathbb{S}^{d-1}$ but limited overlap in $\mathbb{R}^d$, and shows the Gaussian kernel cannot fully replicate NTK/Laplace behavior. It also provides practical tooling (scikit-ntk) and a framework for comparing kernel-based regression methods via GP posteriors and RKHS analysis, highlighting both theoretical and computational implications for kernel design and neural-network-informed kernel methods.

Abstract

The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural networks and a gaze through the veil of their black box. More recently, it has been shown theoretically that the Laplace kernel and neural tangent kernel share the same reproducing kernel Hilbert space in the space of $\mathbb{S}^{d-1}$ alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in $\mathbb{R}^d$ and experiment with them in the task of regression.

An Empirical Analysis of the Laplace and Neural Tangent Kernels

TL;DR

This work empirically analyzes the practical equivalence between the Laplace kernel and the neural tangent kernel (NTK) in Gaussian process regression, emphasizing the unit-sphere domain

where their RKHS coincide. It demonstrates that exact kernel matching and posterior matching depend on jointly tuning the NTK depth

and bias

against the Laplace length-scale

, with deeper networks requiring smaller

and

for alignment. The study finds strong posterior-mean equivalence on

but limited overlap in

, and shows the Gaussian kernel cannot fully replicate NTK/Laplace behavior. It also provides practical tooling (scikit-ntk) and a framework for comparing kernel-based regression methods via GP posteriors and RKHS analysis, highlighting both theoretical and computational implications for kernel design and neural-network-informed kernel methods.

Abstract

alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in

and experiment with them in the task of regression.

Paper Structure (26 sections, 9 theorems, 75 equations, 19 figures, 17 tables, 4 algorithms)

This paper contains 26 sections, 9 theorems, 75 equations, 19 figures, 17 tables, 4 algorithms.

Introduction
Regression
Data Fitting Problem
Gaussian Processes
Neural Networks
Reproducing Kernel Hilbert Spaces
Reproducing Kernels
Mercer Representation
Representer Theorem
Types of Kernels and their Equivalences
Matérn Class of Kernels
Neural Tangent Kernel
RKHS Inclusion
Equivalence of the Laplace and Neural Tangent Kernels
Synthetic Experiments
...and 11 more sections

Key Result

Theorem 2.2.1

Let $\mathcal{X}$ be any set such that there exists a function $m : \mathcal{X} \to \mathbb{R}$ and a positive definite function $k : \mathcal{X} \times \mathcal{X} \to \mathbb{R}$. Then there exists a GP on $\mathcal{X}$ with mean function $m$ and covariance function $k$.

Figures (19)

Figure 1: Sample paths of the Laplace covariance function from a GP prior and posterior. The solid line represents $m(\mathbf{x})$ and $\mathbf{\bar{f}}_*$ in the left and right panel respectively. The shaded bands represent $\text{cov}(\mathbf{f}_*)$. The red x's in the right panel are the training points.
Figure 2: A visualization of a neuron courtesy of neuralTeX.
Figure 3: A visualization of a 2 layer fully connected neural network courtesy of neuralTeX.
Figure 4: Mean and variance plots of $\ell$ given specific $\beta$ calculated using $n=1000$ sample of input pairs for various depths. The solid orange line represents the variance while the dotted blue line represents the mean.
Figure 5: Solid orange line represents the variance and the dotted blue line represents the mean. Top: A zoom in of depth $D=6$ for $\beta\in[0, 10^{-7}]$. Due to the zoom, the mean values are all concentrated around $\approx 1.0524$ and all variance values are near $\approx 2.437\cdot 10^{-5}$. The difference between the minimum and maximum variance shown is approximately $10^{-18}$. Bottom: A showcase of a typical plot past depth 6.
...and 14 more figures

Theorems & Definitions (20)

Definition 2.2.1: Positive Definite Function
Definition 2.2.2: Gaussian Process
Theorem 2.2.1: Kolmogorov Existence Theorem for Gaussian Processes
Definition 2.3.1: Neuron
Definition 2.3.2: Multilayer Perceptron
Definition 3.1.1: Hilbert space
Definition 3.1.2: Reproducing Kernel Hilbert Space
Theorem 3.1.1: Moore-Aronszajn Theorem aronszajn1950theory
Theorem 3.2.1: Mercer's theorem steinwart2008supportmercer1909functions
Theorem 3.2.2: Mercer Representation of RKHSs steinwart2008support
...and 10 more

An Empirical Analysis of the Laplace and Neural Tangent Kernels

TL;DR

Abstract

An Empirical Analysis of the Laplace and Neural Tangent Kernels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (20)