Minimax Generalized Cross-Entropy

Kartheek Bondugula; Santiago Mazuelas; Aritz Pérez; Anqi Liu

Minimax Generalized Cross-Entropy

Kartheek Bondugula, Santiago Mazuelas, Aritz Pérez, Anqi Liu

Abstract

Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.

Minimax Generalized Cross-Entropy

Abstract

Paper Structure (28 sections, 4 theorems, 64 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 4 theorems, 64 equations, 6 figures, 3 tables, 1 algorithm.

INTRODUCTION
RELATED WORK
Preliminaries
Generalized cross-entropy
Minimax framework
MINIMAX GENERALIZED CROSS-ENTROPY
Relation between worst-case distributions and minimax classifier
Performance guarantees
OPTIMIZATION
Stochastic gradients
Effective bisection method
EXPERIMENTAL RESULTS
Experimental setup
Comparison between GCE and MGCE in terms of accuracy and convergence
Evaluation on real-world noisy label
...and 13 more sections

Key Result

Theorem 1

Given a loss function $\ell_\beta$, if $\mathrm{h}_\beta$ is the minimax classifier in eq:soft_clf_mrc, the worst-case distribution $\mathrm{p}_\beta \in \arg \underset{\mathrm{p} \in \mathcal{U}}{\max} \ \ell_\beta(\mathrm{h}_\beta, \mathrm{p})$ is given by Reciprocally, if $\mathrm{p}_\beta$ is the worst-case distribution corresponding to the minimax problem in eq:minmaxrisk, that is, $\mathrm{

Figures (6)

Figure 1: Relation between $\beta$ and the resulting loss function. For $\beta =1$, the loss corresponds to the MAE while for $\beta=\infty$, it corresponds to CE. For $\beta \in(1,\infty)$, the loss interpolates between the MAE and CE.
Figure 2: Relation between the minimax classifier $\mathrm{h}_\beta(x)_y$ and the worst-case probability $\mathrm{p}_\beta(y|x)$ corresponding with 2 classes. For $\beta \in (1,\infty)$, the worst-case probabilities take a cautious stance, avoiding the extremes of MAE ($\beta=1$) and CE ($\beta=\infty$) losses.
Figure 3: Average test accuracy under clean training data obtained for multiple complex datasets. The value of loss parameter $\beta$ is set to 1.4. The figure shows the fast convergence of the proposed MGCE in comparison to the GCE.
Figure 4: Top-1 validation accuracy on the real-world noisy dataset WebVision. The figure shows that the proposed MGCE outperforms GCE, which significantly underfits on this complex dataset due to its non-convexity.
Figure 5: Top-1 test accuracy on the real-world noisy dataset Clothing-1M. The figure shows that the proposed MGCE outperforms GCE, which underfits on this complex dataset with 1 million training samples with noisy labels.
...and 1 more figures

Theorems & Definitions (9)

Theorem 1
proof
Theorem 2
proof
Remark 1
Theorem 3
proof
Corollary 1
proof

Minimax Generalized Cross-Entropy

Abstract

Minimax Generalized Cross-Entropy

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (9)