Table of Contents
Fetching ...

Pretrained Visual Uncertainties

Michael Kirchhof, Mark Collier, Seong Joon Oh, Enkelejda Kasneci

TL;DR

This paper tackles the challenge of transferable uncertainty estimation for vision models by introducing pretrained uncertainty modules that can be shipped with pretrained backbones. The core idea is to add a lightweight uncertainty head trained alongside the main model without interfering with its primary objective, achieved by a Stopgrad mechanism, caching of representations, and a scale-free, ranking-based loss. Empirically, the approach yields strong zero-shot uncertainty transfer on twelve downstream datasets, attains new state-of-the-art performance on the URL benchmark, and demonstrates that the learned uncertainties predominantly capture aleatoric uncertainty, with practical benefits for uncertainty-aware visualization and safe retrieval. The work provides open-source pretrained checkpoints and code, highlighting significant potential for scalable, real-world deployment of uncertainty estimates beyond the training domain and across diverse vision tasks.

Abstract

Accurate uncertainty estimation is vital to trustworthy machine learning, yet uncertainties typically have to be learned for each task anew. This work introduces the first pretrained uncertainty modules for vision models. Similar to standard pretraining this enables the zero-shot transfer of uncertainties learned on a large pretraining dataset to specialized downstream datasets. We enable our large-scale pretraining on ImageNet-21k by solving a gradient conflict in previous uncertainty modules and accelerating the training by up to 180x. We find that the pretrained uncertainties generalize to unseen datasets. In scrutinizing the learned uncertainties, we find that they capture aleatoric uncertainty, disentangled from epistemic components. We demonstrate that this enables safe retrieval and uncertainty-aware dataset visualization. To encourage applications to further problems and domains, we release all pretrained checkpoints and code under https://github.com/mkirchhof/url .

Pretrained Visual Uncertainties

TL;DR

This paper tackles the challenge of transferable uncertainty estimation for vision models by introducing pretrained uncertainty modules that can be shipped with pretrained backbones. The core idea is to add a lightweight uncertainty head trained alongside the main model without interfering with its primary objective, achieved by a Stopgrad mechanism, caching of representations, and a scale-free, ranking-based loss. Empirically, the approach yields strong zero-shot uncertainty transfer on twelve downstream datasets, attains new state-of-the-art performance on the URL benchmark, and demonstrates that the learned uncertainties predominantly capture aleatoric uncertainty, with practical benefits for uncertainty-aware visualization and safe retrieval. The work provides open-source pretrained checkpoints and code, highlighting significant potential for scalable, real-world deployment of uncertainty estimates beyond the training domain and across diverse vision tasks.

Abstract

Accurate uncertainty estimation is vital to trustworthy machine learning, yet uncertainties typically have to be learned for each task anew. This work introduces the first pretrained uncertainty modules for vision models. Similar to standard pretraining this enables the zero-shot transfer of uncertainties learned on a large pretraining dataset to specialized downstream datasets. We enable our large-scale pretraining on ImageNet-21k by solving a gradient conflict in previous uncertainty modules and accelerating the training by up to 180x. We find that the pretrained uncertainties generalize to unseen datasets. In scrutinizing the learned uncertainties, we find that they capture aleatoric uncertainty, disentangled from epistemic components. We demonstrate that this enables safe retrieval and uncertainty-aware dataset visualization. To encourage applications to further problems and domains, we release all pretrained checkpoints and code under https://github.com/mkirchhof/url .
Paper Structure (19 sections, 4 equations, 9 figures, 3 tables)

This paper contains 19 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Our pretrained uncertainties generalize to unseen datasets. The R-AUROC measures the quality of uncertainty estimates on zero-shot datasets, see \ref{['sec:quant']}.
  • Figure 2: Pretrained uncertainties are returned by an auxiliary head (blue) that is trained to predict the classification loss of each image.
  • Figure 3: (a) The uncertainty and classification heads of Loss Prediction are in conflict. We solve this in (b) by adding a stopgrad. It ensures that the uncertainty head's gradients do not interfere with those of the classifier head, stabilizing the performance of both. The uncertainty and classifier heads were finetuned on ImageNet-1k on a pretrained (but unfrozen) ViT-Base backbone.
  • Figure 4: Our pretrained uncertainties outperform the approaches in the URL benchmark kirchhof2023url. The URL benchmark trained ViT-Mediums on ImageNet-1k. We reimplement its best approach (orange) on ViT-Base (green), then enhance it with our changes (red), and finally scale the training of ours to ImageNet-21k with various ViT sizes (blue). Each dot is one seed.
  • Figure 5: Pretrained uncertainties separate clear from ambiguous images on Stanford Online Products, a zero-shot dataset.
  • ...and 4 more figures