Epistemic Uncertainty and Observation Noise with the Neural Tangent Kernel
Sergio Calvo-Ordoñez, Konstantina Palla, Kamil Ciosek
TL;DR
The paper addresses epistemic uncertainty in wide neural networks by extending the NTK-GP correspondence to settings with non-zero aleatoric noise. It derives estimators for the NTK-GP posterior mean under observation noise and for the posterior covariance, both computable via gradient-descent-based optimization. A data-shift trick yields a zero-mean prior, while a covariance estimator leverages a partial SVD of the Jacobian to decompose cross-kernel terms, enabling scalable uncertainty quantification. Empirical results on a toy regression task demonstrate that the proposed mean and covariance approximations closely track the analytic NTK-GP posterior, while remaining computationally efficient and integrable with standard training pipelines.
Abstract
Recent work has shown that training wide neural networks with gradient descent is formally equivalent to computing the mean of the posterior distribution in a Gaussian Process (GP) with the Neural Tangent Kernel (NTK) as the prior covariance and zero aleatoric noise \parencite{jacot2018neural}. In this paper, we extend this framework in two ways. First, we show how to deal with non-zero aleatoric noise. Second, we derive an estimator for the posterior covariance, giving us a handle on epistemic uncertainty. Our proposed approach integrates seamlessly with standard training pipelines, as it involves training a small number of additional predictors using gradient descent on a mean squared error loss. We demonstrate the proof-of-concept of our method through empirical evaluation on synthetic regression.
