Table of Contents
Fetching ...

EXACT: How to Train Your Accuracy

Ivan Karpukhin, Stanislav Dereka, Sergey Kolesnikov

TL;DR

The paper introduces EXACT, a framework for directly maximizing the expected accuracy of a stochastic classifier. By modeling the score vector as $s \sim \mathcal{N}(\mu(x), \sigma^2(x) I)$ and optimizing $\mathcal{A}(\theta) = \mathbb{E}_{x,y} \mathrm{P}(s_y > \max_{i \neq y} s_i)$ via gradient methods, it overcomes the non-differentiability of accuracy. The method relies on efficient evaluation and differentiation of an orthant integral of a multivariate normal distribution, using the Genz algorithm and careful handling of margins, variance scheduling, and gradient normalization. Empirical results on tabular datasets and deep image tasks show that EXACT can yield higher or competitive accuracy than cross-entropy and hinge losses, with modest computational overhead that scales favorably with model complexity and number of classes. This approach offers a principled route to direct metric optimization and potential applicability to other non-differentiable targets.

Abstract

Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.

EXACT: How to Train Your Accuracy

TL;DR

The paper introduces EXACT, a framework for directly maximizing the expected accuracy of a stochastic classifier. By modeling the score vector as and optimizing via gradient methods, it overcomes the non-differentiability of accuracy. The method relies on efficient evaluation and differentiation of an orthant integral of a multivariate normal distribution, using the Genz algorithm and careful handling of margins, variance scheduling, and gradient normalization. Empirical results on tabular datasets and deep image tasks show that EXACT can yield higher or competitive accuracy than cross-entropy and hinge losses, with modest computational overhead that scales favorably with model complexity and number of classes. This approach offers a principled route to direct metric optimization and potential applicability to other non-differentiable targets.

Abstract

Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.
Paper Structure (36 sections, 4 theorems, 40 equations, 10 figures, 9 tables)

This paper contains 36 sections, 4 theorems, 40 equations, 10 figures, 9 tables.

Key Result

Theorem 4.2

Suppose the scores vector $s$ is distributed according to multivariate normal distribution $\mathcal{N}(\mu, \sigma^2 I)$ in $\mathbb{R}^C$. In this case, the probability of the $y$-th score exceeding other scores can be represented as where $\mathcal{N}(t; \mu, \Sigma)$ denotes multivariate normal PDF, $D_y$ is a delta matrix of the order $C$ for the label $y$ and $\Omega_+: \{t \in \mathbb{R}^{

Figures (10)

  • Figure 1: The toy example, which demonstrates importance of accuracy optimization. The model consists of a single bias parameter (decision threshold), while scaling weight is assumed to be 1. EXACT achieves 100% accuracy, while cross-entropy and hinge loss misclassify one element.
  • Figure 2: EXACT training pipeline. The model predicts the mean and variance of the logit vector. EXACT's training objective estimates accuracy, which is differentiable for the stochastic model.
  • Figure 3: Dependency of the expected accuracy on the model parameter in our toy example for different values of $\sigma$.
  • Figure 4: EXACT loss dependency on the model parameter with and w/o margin. Margin affects training with large $\sigma$, creating a better optimization landscape in early epochs.
  • Figure 5: Gradient norm during training on CIFAR-100 for different loss functions.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Theorem 4.2
  • proof
  • Theorem 4.3
  • proof