Scaling Up ROC-Optimizing Support Vector Machines
Gimun Bae, Seung Jun Shin
TL;DR
This paper tackles the scalability of ROC-optimizing SVMs by directly maximizing AUC with ROC-SVM while confronting the $O(n^2)$ training cost. It introduces incomplete generalized U-statistics to reduce pairwise computations and applies Nyström-based low-rank kernel approximation to enable scalable kernel ROC-SVM training, effectively reducing the problem to a linear ROC-SVM in a transformed space. A high-probability bound is provided on the combined approximation error from the incomplete U-statistic and the Nyström method, and extensive experiments on synthetic and real datasets show substantial reductions in training time with minimal loss in AUC. The proposed method offers a practical, scalable solution for imbalanced classification where ROC-AUC is the performance metric.
Abstract
The ROC-SVM, originally proposed by Rakotomamonjy, directly maximizes the area under the ROC curve (AUC) and has become an attractive alternative of the conventional binary classification under the presence of class imbalance. However, its practical use is limited by high computational cost, as training involves evaluating all $O(n^2)$. To overcome this limitation, we develop a scalable variant of the ROC-SVM that leverages incomplete U-statistics, thereby substantially reducing computational complexity. We further extend the framework to nonlinear classification through a low-rank kernel approximation, enabling efficient training in reproducing kernel Hilbert spaces. Theoretical analysis establishes an error bound that justifies the proposed approximation, and empirical results on both synthetic and real datasets demonstrate that the proposed method achieves comparable AUC performance to the original ROC-SVM with drastically reduced training time.
