On Feature Collapse and Deep Kernel Learning for Single Forward Pass Uncertainty
Joost van Amersfoort, Lewis Smith, Andrew Jesson, Oscar Key, Yarin Gal
TL;DR
This work analyzes why Deep Kernel Learning can yield unreliable uncertainty due to feature collapse and introduces Deterministic Uncertainty Estimation (DUE), which constrains the deep feature extractor to be bi-Lipschitz via spectral normalization and residual connections. By pairing a bi-Lipschitz feature space with a sparse variational Gaussian process on a small set of inducing points, DUE preserves the non-parametric uncertainty properties of GPs while enabling single forward-pass predictions. Empirically, DUE outperforms previous single-pass uncertainty methods on CIFAR-10 vs SVHN and a regression benchmark for personalized healthcare, while training end-to-end from scratch with modest inducing-point counts. The approach offers practical, scalable uncertainty estimation with real-time applicability, though it does not guarantee correctness of uncertainty in all cases and highlights future work to strengthen theoretical guarantees and assess societal impact.
Abstract
Inducing point Gaussian process approximations are often considered a gold standard in uncertainty estimation since they retain many of the properties of the exact GP and scale to large datasets. A major drawback is that they have difficulty scaling to high dimensional inputs. Deep Kernel Learning (DKL) promises a solution: a deep feature extractor transforms the inputs over which an inducing point Gaussian process is defined. However, DKL has been shown to provide unreliable uncertainty estimates in practice. We study why, and show that with no constraints, the DKL objective pushes "far-away" data points to be mapped to the same features as those of training-set points. With this insight we propose to constrain DKL's feature extractor to approximately preserve distances through a bi-Lipschitz constraint, resulting in a feature space favorable to DKL. We obtain a model, DUE, which demonstrates uncertainty quality outperforming previous DKL and other single forward pass uncertainty methods, while maintaining the speed and accuracy of standard neural networks.
