Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel

Sergio Calvo-Ordoñez; Konstantina Palla; Kamil Ciosek

Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel

Sergio Calvo-Ordoñez, Konstantina Palla, Kamil Ciosek

TL;DR

The paper addresses epistemic uncertainty in wide neural networks by extending the NTK-GP correspondence to settings with non-zero aleatoric noise. It derives estimators for the NTK-GP posterior mean under observation noise and for the posterior covariance, both computable via gradient-descent-based optimization. A data-shift trick yields a zero-mean prior, while a covariance estimator leverages a partial SVD of the Jacobian to decompose cross-kernel terms, enabling scalable uncertainty quantification. Empirical results on a toy regression task demonstrate that the proposed mean and covariance approximations closely track the analytic NTK-GP posterior, while remaining computationally efficient and integrable with standard training pipelines.

Abstract

Recent work has shown that training wide neural networks with gradient descent is formally equivalent to computing the mean of the posterior distribution in a Gaussian Process (GP) with the Neural Tangent Kernel (NTK) as the prior covariance and zero aleatoric noise \parencite{jacot2018neural}. In this paper, we extend this framework in two ways. First, we show how to deal with non-zero aleatoric noise. Second, we derive an estimator for the posterior covariance, giving us a handle on epistemic uncertainty. Our proposed approach integrates seamlessly with standard training pipelines, as it involves training a small number of additional predictors using gradient descent on a mean squared error loss. We demonstrate the proof-of-concept of our method through empirical evaluation on synthetic regression.

Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel

TL;DR

Abstract

Paper Structure (23 sections, 7 theorems, 36 equations, 1 figure, 2 algorithms)

This paper contains 23 sections, 7 theorems, 36 equations, 1 figure, 2 algorithms.

Introduction
Contributions
Preliminaries
Gaussian Processes
Neural Tangent Kernel.
Method
Aleatoric Noise
Gradient Descent Converges to the NTK-GP Posterior Mean
Zero Prior Mean
Estimating the Covariance
Experiment
Conclusions
Related Work
Neural Tangent Kernel
Predictor Networks
...and 8 more sections

Key Result

Lemma 3.1

Consider a parametric model $f(x; \theta)$ where $x \in \mathcal{X} \subset \mathbb{R}^N$ and $\theta \in \mathbb{R}^p$, initialized under some assumptions with parameters $\theta_0$. Minimizing the regularized mean squared error loss with respect to $\theta$ to find the optimal set of parameters $\ is equivalent to computing the mean posterior of a Gaussian process with non-zero aleatoric noise,

Figures (1)

Figure 1: The NTK-GP posterior and its approximations: (top-left) Analytic Posterior, (top-right) Analytic upper bound on posterior (all eigenvectors), (bottom-left) Analytic upper bound on posterior (5 eigenvectors), (bottom-right) Posterior obtained with gradient descent ($K=5$ predictors, $K' = 0$).

Theorems & Definitions (12)

Lemma 3.1
Lemma 3.2
Proposition 3.1
proof
Lemma B.1
proof
Lemma B.1
proof
Lemma D.1
proof
...and 2 more

Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel

TL;DR

Abstract

Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (12)