EXACT: How to Train Your Accuracy
Ivan Karpukhin, Stanislav Dereka, Sergey Kolesnikov
TL;DR
The paper introduces EXACT, a framework for directly maximizing the expected accuracy of a stochastic classifier. By modeling the score vector as $s \sim \mathcal{N}(\mu(x), \sigma^2(x) I)$ and optimizing $\mathcal{A}(\theta) = \mathbb{E}_{x,y} \mathrm{P}(s_y > \max_{i \neq y} s_i)$ via gradient methods, it overcomes the non-differentiability of accuracy. The method relies on efficient evaluation and differentiation of an orthant integral of a multivariate normal distribution, using the Genz algorithm and careful handling of margins, variance scheduling, and gradient normalization. Empirical results on tabular datasets and deep image tasks show that EXACT can yield higher or competitive accuracy than cross-entropy and hinge losses, with modest computational overhead that scales favorably with model complexity and number of classes. This approach offers a principled route to direct metric optimization and potential applicability to other non-differentiable targets.
Abstract
Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.
