Table of Contents
Fetching ...

Selective Prior Synchronization via SYNC Loss

Ishan Mishra, Jiajie Li, Deepak Mishra, Jinjun Xiong

TL;DR

The SYNC loss is proposed which introduces a novel integration of ad-hoc and post-hoc method, and incorporates the softmax response into the training process of SelectiveNet, enhancing its selective prediction capabilities by examining the selective prior.

Abstract

Prediction under uncertainty is a critical requirement for the deep neural network to succeed responsibly. This paper focuses on selective prediction, which allows DNNs to make informed decisions about when to predict or abstain based on the uncertainty level of their predictions. Current methods are either ad-hoc such as SelectiveNet, focusing on how to modify the network architecture or objective function, or post-hoc such as softmax response, achieving selective prediction through analyzing the model's probabilistic outputs. We observe that post-hoc methods implicitly generate uncertainty information, termed the selective prior, which has traditionally been used only during inference. We argue that the selective prior provided by the selection mechanism is equally vital during the training stage. Therefore, we propose the SYNC loss which introduces a novel integration of ad-hoc and post-hoc method. Specifically, our approach incorporates the softmax response into the training process of SelectiveNet, enhancing its selective prediction capabilities by examining the selective prior. Evaluated across various datasets, including CIFAR-100, ImageNet-100, and Stanford Cars, our method not only enhances the model's generalization capabilities but also surpasses previous works in selective prediction performance, and sets new benchmarks for state-of-the-art performance.

Selective Prior Synchronization via SYNC Loss

TL;DR

The SYNC loss is proposed which introduces a novel integration of ad-hoc and post-hoc method, and incorporates the softmax response into the training process of SelectiveNet, enhancing its selective prediction capabilities by examining the selective prior.

Abstract

Prediction under uncertainty is a critical requirement for the deep neural network to succeed responsibly. This paper focuses on selective prediction, which allows DNNs to make informed decisions about when to predict or abstain based on the uncertainty level of their predictions. Current methods are either ad-hoc such as SelectiveNet, focusing on how to modify the network architecture or objective function, or post-hoc such as softmax response, achieving selective prediction through analyzing the model's probabilistic outputs. We observe that post-hoc methods implicitly generate uncertainty information, termed the selective prior, which has traditionally been used only during inference. We argue that the selective prior provided by the selection mechanism is equally vital during the training stage. Therefore, we propose the SYNC loss which introduces a novel integration of ad-hoc and post-hoc method. Specifically, our approach incorporates the softmax response into the training process of SelectiveNet, enhancing its selective prediction capabilities by examining the selective prior. Evaluated across various datasets, including CIFAR-100, ImageNet-100, and Stanford Cars, our method not only enhances the model's generalization capabilities but also surpasses previous works in selective prediction performance, and sets new benchmarks for state-of-the-art performance.
Paper Structure (34 sections, 3 theorems, 28 equations, 3 figures, 7 tables)

This paper contains 34 sections, 3 theorems, 28 equations, 3 figures, 7 tables.

Key Result

Lemma 1

For $u\in\Delta^{C-1}$ and $\gamma>0$, define $s_\gamma(u):=\bigl(\max_i u_i\bigr)^\gamma$. Then $s_\gamma$ is $L_\gamma$–Lipschitz on $\Delta^{C-1}$ in the $\ell_\infty$ norm, with Equivalently, for all $u,v\in\Delta^{C-1}$, $\;|s_\gamma(u)-s_\gamma(v)| \le L_\gamma \,\|u-v\|_\infty$.

Figures (3)

  • Figure 1: Examples of wrong rejection and wrong acceptance of SN from ImageNet100. We use a threshold of 0.5. $^*$ implies that selective scores are normalized based on its ranking in the test set. Our approach calibrates the selective score by selective prior, therefore correctly avoid the wrong rejection and acceptance.
  • Figure 2: Uncertainty-aware Selective Prediction Framework
  • Figure 3: Selective risk coverage (lower the better) curves comparing SN and SYNC on (a) CIFAR-100 and (b) ImageNet-100.

Theorems & Definitions (3)

  • Lemma 1: Lipschitz continuity of $s_\gamma$ on the simplex
  • Corollary 1: Sharpness--stability trade-off for $\gamma$
  • Proposition 2