Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

Ahmad Rashid; Serena Hacker; Guojun Zhang; Agustinus Kristiadi; Pascal Poupart

Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

Ahmad Rashid, Serena Hacker, Guojun Zhang, Agustinus Kristiadi, Pascal Poupart

TL;DR

This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training.

Abstract

Discriminatively trained, deterministic neural networks are the de facto choice for classification problems. However, even though they achieve state-of-the-art results on in-domain test sets, they tend to be overconfident on out-of-distribution (OOD) data. For instance, ReLU networks - a popular class of neural network architectures - have been shown to almost always yield high confidence predictions when the test data are far away from the training set, even when they are trained with OOD data. We overcome this problem by adding a term to the output of the neural network that corresponds to the logit of an extra class, that we design to dominate the logits of the original classes as we move away from the training data.This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training. Evaluation on various benchmarks demonstrates strong performance against competitive baselines on both far-away and realistic OOD data.

Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

TL;DR

This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training.

Abstract

Paper Structure (15 sections, 4 theorems, 12 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 15 sections, 4 theorems, 12 equations, 3 figures, 12 tables, 1 algorithm.

INTRODUCTION
PRELIMINARIES
Arbitrarily High Confidence on Far-Away Data
METHODOLOGY
RELATED WORKS
Gaussian Assumption.
EXPERIMENTS
Far-Away Data
OOD Benchmarks
Dataset Shifts
CONCLUSION
Experimental Details
Training Details
OOD Test Sets
Additional Results

Key Result

Lemma 3

Let $P(y | x)$ be a classifier defined in eq:softmax and let $x \in \mathbb{R}^n$. If the classifier exhibits arbitrarily high confidence on far-away inputs (i.e., $\lim_{t\rightarrow\infty} P(y|t x) = 1$), then there must exist $c \in \{ 1, \dots, k \}$ such that $\lim_{t\rightarrow\infty}z_c(tx) -

Figures (3)

Figure 1: An illustrative example of the confidence of different methods trained on a synthetic binary classification dataset. The shades of green display the confidence of each algorithm with a darker shade signifying a higher confidence. The bottom row gives a zoomed-out view.
Figure 2: Effect of training with an OOD class with our method on a $1$-D binary classification problem. Standard logits keep on growing when away from the data. We implement the OOD class such that the logits grow much faster for the OOD class compared to the in-domain class. This 'fixes' the probabilities and the confidence away from the dataset. Note that the range of y values is larger on the second plot.
Figure 3: Calibration results, measured on the ECE metric, on Rotated MNIST and CIFAR10-C following ovadia2019can.

Theorems & Definitions (9)

Definition 1
Definition 2
Lemma 3
proof
Theorem 4
proof
Lemma 5: hein2019relu, hein2019relu
Theorem 6
proof

Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

TL;DR

Abstract

Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)