Rethinking Cost-sensitive Classification in Deep Learning via Adversarial Data Augmentation
Qiyuan Chen, Raed Al Kontar, Maher Nouiehed, Jessie Yang, Corey Lester
TL;DR
This work tackles cost-sensitive multiclass classification in over-parameterized DNNs, where standard training can erase cost distinctions due to perfect interpolation. It introduces CSADA, a cost-sensitive adversarial data augmentation framework that generates targeted perturbations $\delta^{(y,z)} = \arg \max_{\|\delta\| \le \epsilon} p_z(\theta; (x_i+\delta, y_i))$ and optimizes a penalized augmented loss $\ell_{augmented}(\theta;x,y,\delta) = \ell(f(\theta,x_i),y_i) + \lambda \sum_z \tilde c(y_i,z) \ell(f(\theta, x_i+\delta^{(y_i,z)}), y_i)$ with weights $\tilde c(y,z) = c(y,z)^\tau / \sum c(y,z)^\tau$. A stochastic variant reduces computation by sampling a single critical pair per batch. Empirically, CSADA lowers overall misclassification cost and critical errors on MNIST, CIFAR-10, and the PMI dataset while maintaining comparable accuracy, demonstrating a practical route to embed cost-awareness into deep classifiers and potentially other models.
Abstract
Cost-sensitive classification is critical in applications where misclassification errors widely vary in cost. However, over-parameterization poses fundamental challenges to the cost-sensitive modeling of deep neural networks (DNNs). The ability of a DNN to fully interpolate a training dataset can render a DNN, evaluated purely on the training set, ineffective in distinguishing a cost-sensitive solution from its overall accuracy maximization counterpart. This necessitates rethinking cost-sensitive classification in DNNs. To address this challenge, this paper proposes a cost-sensitive adversarial data augmentation (CSADA) framework to make over-parameterized models cost-sensitive. The overarching idea is to generate targeted adversarial examples that push the decision boundary in cost-aware directions. These targeted adversarial samples are generated by maximizing the probability of critical misclassifications and used to train a model with more conservative decisions on costly pairs. Experiments on well-known datasets and a pharmacy medication image (PMI) dataset made publicly available show that our method can effectively minimize the overall cost and reduce critical errors, while achieving comparable performance in terms of overall accuracy.
