Table of Contents
Fetching ...

Uncertainties of Latent Representations in Computer Vision

Michael Kirchhof

TL;DR

This thesis develops a framework for attaching uncertainty estimates to pretrained latent representations in computer vision, aiming to make uncertainty quantification scalable and transferable. It introduces probabilistic embeddings built with $vMF$ and non-isotropic $vMF$ distributions and the MCInfoNCE objective, proving that the learned variances recover the true posterior $P(Z|X)$ (up to rotation) and align with aleatoric uncertainty. A large-scale, zero-shot transferable benchmark, URL, with the R-AUROC metric, demonstrates that several methods (including MCInfoNCE, nivMF, and loss prediction) achieve strong transferability, though some gradient interactions can degrade main task performance. The work further shows pretrained uncertainties transfer across datasets and primarily capture aleatoric uncertainty, enabling practical, trustworthy deployment and setting the stage for specialized uncertainty tools in future CV research.

Abstract

Uncertainty quantification is a key pillar of trustworthy machine learning. It enables safe reactions under unsafe inputs, like predicting only when the machine learning model detects sufficient evidence, discarding anomalous data, or emitting warnings when an error is likely to be inbound. This is particularly crucial in safety-critical areas like medical image classification or self-driving cars. Despite the plethora of proposed uncertainty quantification methods achieving increasingly higher scores on performance benchmarks, uncertainty estimates are often shied away from in practice. Many machine learning projects start from pretrained latent representations that come without uncertainty estimates. Uncertainties would need to be trained by practitioners on their own, which is notoriously difficult and resource-intense. This thesis makes uncertainty estimates easily accessible by adding them to the latent representation vectors of pretrained computer vision models. Besides proposing approaches rooted in probability and decision theory, such as Monte-Carlo InfoNCE (MCInfoNCE) and loss prediction, we delve into both theoretical and empirical questions. We show that these unobservable uncertainties about unobservable latent representations are indeed provably correct. We also provide an uncertainty-aware representation learning (URL) benchmark to compare these unobservables against observable ground-truths. Finally, we compile our findings to pretrain lightweight representation uncertainties on large-scale computer vision models that transfer to unseen datasets in a zero-shot manner. Our findings do not only advance the current theoretical understanding of uncertainties over latent variables, but also facilitate the access to uncertainty quantification for future researchers inside and outside the field, enabling straightforward but trustworthy machine learning.

Uncertainties of Latent Representations in Computer Vision

TL;DR

This thesis develops a framework for attaching uncertainty estimates to pretrained latent representations in computer vision, aiming to make uncertainty quantification scalable and transferable. It introduces probabilistic embeddings built with and non-isotropic distributions and the MCInfoNCE objective, proving that the learned variances recover the true posterior (up to rotation) and align with aleatoric uncertainty. A large-scale, zero-shot transferable benchmark, URL, with the R-AUROC metric, demonstrates that several methods (including MCInfoNCE, nivMF, and loss prediction) achieve strong transferability, though some gradient interactions can degrade main task performance. The work further shows pretrained uncertainties transfer across datasets and primarily capture aleatoric uncertainty, enabling practical, trustworthy deployment and setting the stage for specialized uncertainty tools in future CV research.

Abstract

Uncertainty quantification is a key pillar of trustworthy machine learning. It enables safe reactions under unsafe inputs, like predicting only when the machine learning model detects sufficient evidence, discarding anomalous data, or emitting warnings when an error is likely to be inbound. This is particularly crucial in safety-critical areas like medical image classification or self-driving cars. Despite the plethora of proposed uncertainty quantification methods achieving increasingly higher scores on performance benchmarks, uncertainty estimates are often shied away from in practice. Many machine learning projects start from pretrained latent representations that come without uncertainty estimates. Uncertainties would need to be trained by practitioners on their own, which is notoriously difficult and resource-intense. This thesis makes uncertainty estimates easily accessible by adding them to the latent representation vectors of pretrained computer vision models. Besides proposing approaches rooted in probability and decision theory, such as Monte-Carlo InfoNCE (MCInfoNCE) and loss prediction, we delve into both theoretical and empirical questions. We show that these unobservable uncertainties about unobservable latent representations are indeed provably correct. We also provide an uncertainty-aware representation learning (URL) benchmark to compare these unobservables against observable ground-truths. Finally, we compile our findings to pretrain lightweight representation uncertainties on large-scale computer vision models that transfer to unseen datasets in a zero-shot manner. Our findings do not only advance the current theoretical understanding of uncertainties over latent variables, but also facilitate the access to uncertainty quantification for future researchers inside and outside the field, enabling straightforward but trustworthy machine learning.
Paper Structure (52 sections, 3 equations, 9 figures)

This paper contains 52 sections, 3 equations, 9 figures.

Figures (9)

  • Figure 1: Images can be inherently ambiguous, making it necessary to quantify their uncertainty. Both images are from the ImageNet-1k benchmark dataset deng2009imagenet.
  • Figure 2: Densities of a vMF and a non-isotropic vMF distributions on a three-dimensional unit-sphere. Purple is a low and yellow a high density. Figure adapted from the original paper kirchhof2022non.
  • Figure 3: Probabilistic embeddings (green) lead to better retrieval performance than deterministic ones (blue). Bars show the standard deviation across five seeds. Figure adapted from the original paper kirchhof2022non.
  • Figure 4: Images are created from unknown latent vectors by a data-generating process. Deterministic image representations intend to rediscover this vector (top). When the data-generating process is probabilistic (bottom) and creates ambiguous images, it loses information about the latent vectors, so that several ones could have created the image. Probabilistic embeddings recover this posterior, which we prove for MCInfoNCE. Figure cited from the original paper kirchhof2023probabilistic.
  • Figure 5: Dots represent all models we train with all approaches, hyperparameters, and backbones. Models with a higher R-AUROC reflect human uncertainties better (left) and behave better under uncertainty inducing transforms like cropping (right). This supports the R-AUROC empirically. Figure cited from the original paper kirchhof2023url.
  • ...and 4 more figures