Uncertainties of Latent Representations in Computer Vision
Michael Kirchhof
TL;DR
This thesis develops a framework for attaching uncertainty estimates to pretrained latent representations in computer vision, aiming to make uncertainty quantification scalable and transferable. It introduces probabilistic embeddings built with $vMF$ and non-isotropic $vMF$ distributions and the MCInfoNCE objective, proving that the learned variances recover the true posterior $P(Z|X)$ (up to rotation) and align with aleatoric uncertainty. A large-scale, zero-shot transferable benchmark, URL, with the R-AUROC metric, demonstrates that several methods (including MCInfoNCE, nivMF, and loss prediction) achieve strong transferability, though some gradient interactions can degrade main task performance. The work further shows pretrained uncertainties transfer across datasets and primarily capture aleatoric uncertainty, enabling practical, trustworthy deployment and setting the stage for specialized uncertainty tools in future CV research.
Abstract
Uncertainty quantification is a key pillar of trustworthy machine learning. It enables safe reactions under unsafe inputs, like predicting only when the machine learning model detects sufficient evidence, discarding anomalous data, or emitting warnings when an error is likely to be inbound. This is particularly crucial in safety-critical areas like medical image classification or self-driving cars. Despite the plethora of proposed uncertainty quantification methods achieving increasingly higher scores on performance benchmarks, uncertainty estimates are often shied away from in practice. Many machine learning projects start from pretrained latent representations that come without uncertainty estimates. Uncertainties would need to be trained by practitioners on their own, which is notoriously difficult and resource-intense. This thesis makes uncertainty estimates easily accessible by adding them to the latent representation vectors of pretrained computer vision models. Besides proposing approaches rooted in probability and decision theory, such as Monte-Carlo InfoNCE (MCInfoNCE) and loss prediction, we delve into both theoretical and empirical questions. We show that these unobservable uncertainties about unobservable latent representations are indeed provably correct. We also provide an uncertainty-aware representation learning (URL) benchmark to compare these unobservables against observable ground-truths. Finally, we compile our findings to pretrain lightweight representation uncertainties on large-scale computer vision models that transfer to unseen datasets in a zero-shot manner. Our findings do not only advance the current theoretical understanding of uncertainties over latent variables, but also facilitate the access to uncertainty quantification for future researchers inside and outside the field, enabling straightforward but trustworthy machine learning.
