Table of Contents
Fetching ...

SURE: SUrvey REcipes for building reliable and robust deep networks

Yuting Li, Yingyi Chen, Xuanlong Yu, Dexiong Chen, Xi Shen

TL;DR

SURE addresses robust uncertainty estimation in deep networks under real-world challenges such as data corruption, noisy labels, and long-tailed distributions. It unifies model regularization, classifier design, and optimization into two core ideas: increasing entropy for hard samples and enforcing flat minima via SAM and SWA. The approach combines RegMixup, Correctness Ranking Loss, and Cosine Similarity Classifier with SAM/SWA, achieving superior failure-prediction performance and competitive robustness on noisy-label and distribution-shift benchmarks, including state-of-the-art results on Food-101N without task-specific tweaks. These findings suggest a practical path toward reliable uncertainty estimation in diverse real-world deployments, with broad applicability across datasets and architectures.

Abstract

In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability. Our investigation reveals that an integrated application of diverse techniques--spanning model regularization, classifier and optimization--substantially improves the accuracy of uncertainty predictions in image classification tasks. The synergistic effect of these techniques culminates in our novel SURE approach. We rigorously evaluate SURE against the benchmark of failure prediction, a critical testbed for uncertainty estimation efficacy. Our results showcase a consistently better performance than models that individually deploy each technique, across various datasets and model architectures. When applied to real-world challenges, such as data corruption, label noise, and long-tailed class distribution, SURE exhibits remarkable robustness, delivering results that are superior or on par with current state-of-the-art specialized methods. Particularly on Animal-10N and Food-101N for learning with noisy labels, SURE achieves state-of-the-art performance without any task-specific adjustments. This work not only sets a new benchmark for robust uncertainty estimation but also paves the way for its application in diverse, real-world scenarios where reliability is paramount. Our code is available at \url{https://yutingli0606.github.io/SURE/}.

SURE: SUrvey REcipes for building reliable and robust deep networks

TL;DR

SURE addresses robust uncertainty estimation in deep networks under real-world challenges such as data corruption, noisy labels, and long-tailed distributions. It unifies model regularization, classifier design, and optimization into two core ideas: increasing entropy for hard samples and enforcing flat minima via SAM and SWA. The approach combines RegMixup, Correctness Ranking Loss, and Cosine Similarity Classifier with SAM/SWA, achieving superior failure-prediction performance and competitive robustness on noisy-label and distribution-shift benchmarks, including state-of-the-art results on Food-101N without task-specific tweaks. These findings suggest a practical path toward reliable uncertainty estimation in diverse real-world deployments, with broad applicability across datasets and architectures.

Abstract

In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability. Our investigation reveals that an integrated application of diverse techniques--spanning model regularization, classifier and optimization--substantially improves the accuracy of uncertainty predictions in image classification tasks. The synergistic effect of these techniques culminates in our novel SURE approach. We rigorously evaluate SURE against the benchmark of failure prediction, a critical testbed for uncertainty estimation efficacy. Our results showcase a consistently better performance than models that individually deploy each technique, across various datasets and model architectures. When applied to real-world challenges, such as data corruption, label noise, and long-tailed class distribution, SURE exhibits remarkable robustness, delivering results that are superior or on par with current state-of-the-art specialized methods. Particularly on Animal-10N and Food-101N for learning with noisy labels, SURE achieves state-of-the-art performance without any task-specific adjustments. This work not only sets a new benchmark for robust uncertainty estimation but also paves the way for its application in diverse, real-world scenarios where reliability is paramount. Our code is available at \url{https://yutingli0606.github.io/SURE/}.
Paper Structure (41 sections, 8 equations, 5 figures, 9 tables)

This paper contains 41 sections, 8 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: SURE consistently performs better than previous approaches to uncertainty estimation under various scenarios. Note that we did not manage to scale RegMixup pinto2022using to the learning with noisy label task. Baseline refers to the MSP hendrycks2016baseline method.
  • Figure 2: Overview of recipes. Our proposed approach SURE contains two aspects: increasing entropy for hard samples and enforcing flat minima during optimization. We incorporate RegMixup pinto2022using loss and correctness ranking loss (CRL) moon2020confidence as our loss function and employ cosine similarity classifier (CSC) gidaris2018dynamichu2020empirical as our classifier to increase entropy for hard samples. As in optimization, we leverage Sharpness-Aware Minimization (SAM) foret2020sharpness and Stochastic Weight Averaging (SWA) izmailov2018averaging to find flat minima.
  • Figure 3: Comparison of the average AUROC davis2006relationship (higher is better) and AURC geifman2018bias (lower is better) on CIFAR10-C hendrycks2019benchmarking. We use DenseNet huang2017densely as the backbone and train on the standard CIFAR10 training set. The evaluation results are averaged across the images with 15 types of corruption under 5 severity levels.
  • Figure 4: The visual results of confidence separation given by different methods on CIFAR100-LT cui2019class IF=10. SURE leads to better confidence separation than MSP hendrycks2016baseline and FMFP zhu2022rethinking.
  • Figure 5: Comparison of the average AUROC davis2006relationship (higher is better) and AURC davis2006relationship (lower is better) on CIFAR10-C hendrycks2019benchmarking. We choose DenseNet huang2017densely as the backbone and CIFAR-10 as the training set. The evaluation results are averaged across the images with 5 severity levels under 15 types of corruption.