Noise Contrastive Priors for Functional Uncertainty
Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, James Davidson
TL;DR
This work tackles unreliable uncertainty estimates in neural networks by introducing Noise Contrastive Priors (NCPs), which impose data-space priors to encourage high uncertainty for inputs outside the training distribution. NCPs combine an input perturbation strategy with a wide output prior and can be incorporated into variational frameworks by penalizing deviations in output space, yielding a scalable, function-level prior. Empirically, NCPs improve uncertainty estimates and active-learning performance on both toy and large-scale flight-delay regression tasks, with BBB+NCP often delivering the strongest improvements and stable generalization to unseen data. The approach provides a practical, scalable alternative to weight-space priors and highlights a fruitful direction toward explicit, data-driven priors for robust extrapolation.
Abstract
Obtaining reliable uncertainty estimates of neural network predictions is a long standing challenge. Bayesian neural networks have been proposed as a solution, but it remains open how to specify their prior. In particular, the common practice of an independent normal prior in weight space imposes relatively weak constraints on the function posterior, allowing it to generalize in unforeseen ways on inputs outside of the training distribution. We propose noise contrastive priors (NCPs) to obtain reliable uncertainty estimates. The key idea is to train the model to output high uncertainty for data points outside of the training distribution. NCPs do so using an input prior, which adds noise to the inputs of the current mini batch, and an output prior, which is a wide distribution given these inputs. NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training. Empirically, we show that NCPs prevent overfitting outside of the training distribution and result in uncertainty estimates that are useful for active learning. We demonstrate the scalability of our method on the flight delays data set, where we significantly improve upon previously published results.
