Table of Contents
Fetching ...

An Empirical Analysis of the Laplace and Neural Tangent Kernels

Ronaldas Paulius Lencevičius

TL;DR

This work empirically analyzes the practical equivalence between the Laplace kernel and the neural tangent kernel (NTK) in Gaussian process regression, emphasizing the unit-sphere domain $\mathbb{S}^{d-1}$ where their RKHS coincide. It demonstrates that exact kernel matching and posterior matching depend on jointly tuning the NTK depth $D$ and bias $\beta$ against the Laplace length-scale $\ell$, with deeper networks requiring smaller $\beta$ and $\ell$ for alignment. The study finds strong posterior-mean equivalence on $\mathbb{S}^{d-1}$ but limited overlap in $\mathbb{R}^d$, and shows the Gaussian kernel cannot fully replicate NTK/Laplace behavior. It also provides practical tooling (scikit-ntk) and a framework for comparing kernel-based regression methods via GP posteriors and RKHS analysis, highlighting both theoretical and computational implications for kernel design and neural-network-informed kernel methods.

Abstract

The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural networks and a gaze through the veil of their black box. More recently, it has been shown theoretically that the Laplace kernel and neural tangent kernel share the same reproducing kernel Hilbert space in the space of $\mathbb{S}^{d-1}$ alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in $\mathbb{R}^d$ and experiment with them in the task of regression.

An Empirical Analysis of the Laplace and Neural Tangent Kernels

TL;DR

This work empirically analyzes the practical equivalence between the Laplace kernel and the neural tangent kernel (NTK) in Gaussian process regression, emphasizing the unit-sphere domain where their RKHS coincide. It demonstrates that exact kernel matching and posterior matching depend on jointly tuning the NTK depth and bias against the Laplace length-scale , with deeper networks requiring smaller and for alignment. The study finds strong posterior-mean equivalence on but limited overlap in , and shows the Gaussian kernel cannot fully replicate NTK/Laplace behavior. It also provides practical tooling (scikit-ntk) and a framework for comparing kernel-based regression methods via GP posteriors and RKHS analysis, highlighting both theoretical and computational implications for kernel design and neural-network-informed kernel methods.

Abstract

The neural tangent kernel is a kernel function defined over the parameter distribution of an infinite width neural network. Despite the impracticality of this limit, the neural tangent kernel has allowed for a more direct study of neural networks and a gaze through the veil of their black box. More recently, it has been shown theoretically that the Laplace kernel and neural tangent kernel share the same reproducing kernel Hilbert space in the space of alluding to their equivalence. In this work, we analyze the practical equivalence of the two kernels. We first do so by matching the kernels exactly and then by matching posteriors of a Gaussian process. Moreover, we analyze the kernels in and experiment with them in the task of regression.
Paper Structure (26 sections, 9 theorems, 75 equations, 19 figures, 17 tables, 4 algorithms)

This paper contains 26 sections, 9 theorems, 75 equations, 19 figures, 17 tables, 4 algorithms.

Key Result

Theorem 2.2.1

Let $\mathcal{X}$ be any set such that there exists a function $m : \mathcal{X} \to \mathbb{R}$ and a positive definite function $k : \mathcal{X} \times \mathcal{X} \to \mathbb{R}$. Then there exists a GP on $\mathcal{X}$ with mean function $m$ and covariance function $k$.

Figures (19)

  • Figure 1: Sample paths of the Laplace covariance function from a GP prior and posterior. The solid line represents $m(\mathbf{x})$ and $\mathbf{\bar{f}}_*$ in the left and right panel respectively. The shaded bands represent $\text{cov}(\mathbf{f}_*)$. The red x's in the right panel are the training points.
  • Figure 2: A visualization of a neuron courtesy of neuralTeX.
  • Figure 3: A visualization of a 2 layer fully connected neural network courtesy of neuralTeX.
  • Figure 4: Mean and variance plots of $\ell$ given specific $\beta$ calculated using $n=1000$ sample of input pairs for various depths. The solid orange line represents the variance while the dotted blue line represents the mean.
  • Figure 5: Solid orange line represents the variance and the dotted blue line represents the mean. Top: A zoom in of depth $D=6$ for $\beta\in[0, 10^{-7}]$. Due to the zoom, the mean values are all concentrated around $\approx 1.0524$ and all variance values are near $\approx 2.437\cdot 10^{-5}$. The difference between the minimum and maximum variance shown is approximately $10^{-18}$. Bottom: A showcase of a typical plot past depth 6.
  • ...and 14 more figures

Theorems & Definitions (20)

  • Definition 2.2.1: Positive Definite Function
  • Definition 2.2.2: Gaussian Process
  • Theorem 2.2.1: Kolmogorov Existence Theorem for Gaussian Processes
  • Definition 2.3.1: Neuron
  • Definition 2.3.2: Multilayer Perceptron
  • Definition 3.1.1: Hilbert space
  • Definition 3.1.2: Reproducing Kernel Hilbert Space
  • Theorem 3.1.1: Moore-Aronszajn Theorem aronszajn1950theory
  • Theorem 3.2.1: Mercer's theorem steinwart2008supportmercer1909functions
  • Theorem 3.2.2: Mercer Representation of RKHSs steinwart2008support
  • ...and 10 more