Table of Contents
Fetching ...

Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions

Kelsey Lieberman, Shuai Yuan, Swarna Kamlam Ravindran, Carlo Tomasi

TL;DR

This work targets binary classification under severe class imbalance by focusing on ROC optimization rather than overall accuracy. It analyzes the base Vector Scaling (VS) loss, showing large ROC variance across hyperparameters at high imbalance and proposing Loss Conditional Training (LCT) to train a single model over a family of losses, thereby approximating ROC tradeoffs. Through FiLM-conditioned LCT and sampling of loss parameters (notably $\tau$), the approach yields more robust ROC curves and reduced sensitivity to hyperparameter choices across CIFAR, CIFAR-100, and Kaggle melanoma datasets, outperforming standard VS loss especially at large $\beta$. The results suggest practical benefits for imbalanced binary tasks and point to future work extending LCT to multi-class and regression settings, with code available at the provided repository.

Abstract

Although binary classification is a well-studied problem in computer vision, training reliable classifiers under severe class imbalance remains a challenging problem. Recent work has proposed techniques that mitigate the effects of training under imbalance by modifying the loss functions or optimization methods. While this work has led to significant improvements in the overall accuracy in the multi-class case, we observe that slight changes in hyperparameter values of these methods can result in highly variable performance in terms of Receiver Operating Characteristic (ROC) curves on binary problems with severe imbalance. To reduce the sensitivity to hyperparameter choices and train more general models, we propose training over a family of loss functions, instead of a single loss function. We develop a method for applying Loss Conditional Training (LCT) to an imbalanced classification problem. Extensive experiment results, on both CIFAR and Kaggle competition datasets, show that our method improves model performance and is more robust to hyperparameter choices. Code is available at https://github.com/klieberman/roc_lct.

Optimizing for ROC Curves on Class-Imbalanced Data by Training over a Family of Loss Functions

TL;DR

This work targets binary classification under severe class imbalance by focusing on ROC optimization rather than overall accuracy. It analyzes the base Vector Scaling (VS) loss, showing large ROC variance across hyperparameters at high imbalance and proposing Loss Conditional Training (LCT) to train a single model over a family of losses, thereby approximating ROC tradeoffs. Through FiLM-conditioned LCT and sampling of loss parameters (notably ), the approach yields more robust ROC curves and reduced sensitivity to hyperparameter choices across CIFAR, CIFAR-100, and Kaggle melanoma datasets, outperforming standard VS loss especially at large . The results suggest practical benefits for imbalanced binary tasks and point to future work extending LCT to multi-class and regression settings, with code available at the provided repository.

Abstract

Although binary classification is a well-studied problem in computer vision, training reliable classifiers under severe class imbalance remains a challenging problem. Recent work has proposed techniques that mitigate the effects of training under imbalance by modifying the loss functions or optimization methods. While this work has led to significant improvements in the overall accuracy in the multi-class case, we observe that slight changes in hyperparameter values of these methods can result in highly variable performance in terms of Receiver Operating Characteristic (ROC) curves on binary problems with severe imbalance. To reduce the sensitivity to hyperparameter choices and train more general models, we propose training over a family of loss functions, instead of a single loss function. We develop a method for applying Loss Conditional Training (LCT) to an imbalanced classification problem. Extensive experiment results, on both CIFAR and Kaggle competition datasets, show that our method improves model performance and is more robust to hyperparameter choices. Code is available at https://github.com/klieberman/roc_lct.
Paper Structure (42 sections, 32 equations, 12 figures, 9 tables)

This paper contains 42 sections, 32 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: Distribution of Area Under the ROC Curve (AUC) values obtained by training the same model on the SIIM-ISIC Melanoma classification dataset with 48 different combinations of hyperparameters on VS Loss (hyperparameter values are given in Section \ref{['sec: experiments']}). Results are shown at three different imbalance ratios. As the imbalance becomes more severe, model performance drops and the variance in performance drastically increases. LCT addresses both of these issues by training over a family of loss functions, instead of a single loss function with one combination of hyperparameter values.
  • Figure 2: Distribution of results obtained from training 512 models with different hyperparameter values. Left: mean, min, max, and 95% confidence interval of ROC curves. Right: boxplot of distribution of Area Under the ROC Curve (AUC) values.
  • Figure 3: Effect of VS loss hyperparameters on AUC, overall accuracy, and TPR. Results are shown for 512 models with different hyperparameter values. For each metric and hyperparameter we plot a) all the values of the metric vs. the hyperparameter (dots) and b) a fitted degree-2 polynomial between the metric and hyperparameter (curves). In the table, we report the $R^2$ values of polynomials fit with all three hyperparameters. All models were trained on CIFAR10 cat vs. dog with $\beta=100$ with a ResNet32 model. Most of the variance in AUC cannot be explained by the hyperparameter values.
  • Figure 4: Effect of $\tau$ on loss landscape. Each plot shows $\ell_{VS}(1,\mathbf{z})- \ell_{VS}(0,\mathbf{z})$ over $z_0, z_1 \in [-5, 5]$ for $\beta=10$. White: "Break-even" points. $\boldsymbol{\tau}$ shifts the loss landscape.
  • Figure 5: Distribution of ROC curves of models trained with (red) and without (blue) LCT on CIFAR datasets at four different imbalance ratios $\beta$. Solid, dashed, and dotted-dashed curves are mean, minimum, and maximum of the ROC curves respectively. Shaded region is one standard deviation away from the mean. Datasets are tested with $\beta=10, 50, 100, 200$. At high imbalance ratios, LCT consistently improves the mean, max, and min of the ROC curves.
  • ...and 7 more figures