Table of Contents
Fetching ...

Approaching Test Time Augmentation in the Context of Uncertainty Calibration for Deep Neural Networks

Pedro Conde, Tiago Barros, Rui L. Lopes, Cristiano Premebida, Urbano J. Nunes

TL;DR

Empirical results indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques, and show improvements in terms of predictive entropy on out-of-distribution samples.

Abstract

With the rise of Deep Neural Networks, machine learning systems are nowadays ubiquitous in a number of real-world applications, which bears the need for highly reliable models. This requires a thorough look not only at the accuracy of such systems, but also at their predictive uncertainty. Hence, we propose a novel technique (with two different variations, named M-ATTA and V-ATTA) based on test time augmentation, to improve the uncertainty calibration of deep models for image classification. By leveraging na adaptive weighting system, M/V-ATTA improves uncertainty calibration without affecting the model's accuracy. The performance of these techniques is evaluated by considering diverse metrics related to uncertainty calibration, demonstrating their robustness. Empirical results, obtained on CIFAR-10, CIFAR-100, Aerial Image Dataset, as well as in two different scenarios under distribution-shift, indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques. Furthermore, the methods proposed also show improvements in terms of predictive entropy on out-of-distribution samples. Code for M/V-ATTA available at: https://github.com/pedrormconde/MV-ATTA

Approaching Test Time Augmentation in the Context of Uncertainty Calibration for Deep Neural Networks

TL;DR

Empirical results indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques, and show improvements in terms of predictive entropy on out-of-distribution samples.

Abstract

With the rise of Deep Neural Networks, machine learning systems are nowadays ubiquitous in a number of real-world applications, which bears the need for highly reliable models. This requires a thorough look not only at the accuracy of such systems, but also at their predictive uncertainty. Hence, we propose a novel technique (with two different variations, named M-ATTA and V-ATTA) based on test time augmentation, to improve the uncertainty calibration of deep models for image classification. By leveraging na adaptive weighting system, M/V-ATTA improves uncertainty calibration without affecting the model's accuracy. The performance of these techniques is evaluated by considering diverse metrics related to uncertainty calibration, demonstrating their robustness. Empirical results, obtained on CIFAR-10, CIFAR-100, Aerial Image Dataset, as well as in two different scenarios under distribution-shift, indicate that the proposed methods outperform several state-of-the-art post-hoc calibration techniques. Furthermore, the methods proposed also show improvements in terms of predictive entropy on out-of-distribution samples. Code for M/V-ATTA available at: https://github.com/pedrormconde/MV-ATTA
Paper Structure (17 sections, 16 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 17 sections, 16 equations, 4 figures, 5 tables, 2 algorithms.

Figures (4)

  • Figure 1: Overview of the common structure of both M-ATTA and V-ATTA - from a "high-level" perspective - to serve as a general graphical support for the detailed description presented in Subsections \ref{['subsect:M-Atta']} and \ref{['subsection:V-Atta']}. In this figure, we consider (for the purpose of illustration) $n_i = 5, \forall i \in \{1,2,m\}$, although this value can take the form of any natural number, as clarified by the detailed description.
  • Figure 2: Results with respect to the average OOD prediction entropy, using the DNN and the calibration methods that have been trained/optimized on the CIFAR-10, exposed to the AID test set.
  • Figure 3: Results with respect to the average OOD prediction entropy, using the DNN and the calibration methodsthat have been trained/optimized on the AID, exposed to the CIFAR-10 test set.
  • Figure 4: Comparing the Brier and mc-Brier scores obtained with M-ATTA and V-ATTA, with different validation/test set ratios, evaluated in the respective test sets of both CIFAR-10 and CIFAR-100 datasets.