Table of Contents
Fetching ...

Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance

Yanxuan Yu, Michael S. Hughes, Julien Lee, Jiacheng Zhou, Andrew F. Laine

TL;DR

The paper tackles reliable diagnosis under extreme class imbalance, where missing true positives is dangerous and calibration matters. It proposes AF-SMOTE, which synthesizes minority samples via SMOTE-like interpolation and then filters them through adversarial realism and boundary-utility scoring, combining scores as $S(x)=\lambda s_{util}(x)+(1-\lambda)s_{real}(x)$. The authors prove that, under mild assumptions, this filtering yields a monotone improvement of the surrogate $\widetilde{F}_\beta(\theta)$ for $\beta\ge 1$ and does not inflate the Brier score. Empirically, AF-SMOTE improves recall and average precision and achieves the best calibration on MIMIC-IV proxy diagnosis and fraud benchmarks, with robust gains in high-dimensional settings via lightweight PCA pre-processing, demonstrating practical value for clinical and other high-stakes applications.

Abstract

We study classification under extreme class imbalance where recall and calibration are both critical, for example in medical diagnosis scenarios. We propose AF-SMOTE, a mathematically motivated augmentation framework that first synthesizes minority points and then filters them by an adversarial discriminator and a boundary utility model. We prove that, under mild assumptions on the decision boundary smoothness and class-conditional densities, our filtering step monotonically improves a surrogate of F_beta (for beta >= 1) while not inflating Brier score. On MIMIC-IV proxy label prediction and canonical fraud detection benchmarks, AF-SMOTE attains higher recall and average precision than strong oversampling baselines (SMOTE, ADASYN, Borderline-SMOTE, SVM-SMOTE), and yields the best calibration. We further validate these gains across multiple additional datasets beyond MIMIC-IV. Our successful application of AF-SMOTE to a healthcare dataset using a proxy label demonstrates in a disease-agnostic way its practical value in clinical situations, where missing true positive cases in rare diseases can have severe consequences.

Boundary-Aware Adversarial Filtering for Reliable Diagnosis under Extreme Class Imbalance

TL;DR

The paper tackles reliable diagnosis under extreme class imbalance, where missing true positives is dangerous and calibration matters. It proposes AF-SMOTE, which synthesizes minority samples via SMOTE-like interpolation and then filters them through adversarial realism and boundary-utility scoring, combining scores as . The authors prove that, under mild assumptions, this filtering yields a monotone improvement of the surrogate for and does not inflate the Brier score. Empirically, AF-SMOTE improves recall and average precision and achieves the best calibration on MIMIC-IV proxy diagnosis and fraud benchmarks, with robust gains in high-dimensional settings via lightweight PCA pre-processing, demonstrating practical value for clinical and other high-stakes applications.

Abstract

We study classification under extreme class imbalance where recall and calibration are both critical, for example in medical diagnosis scenarios. We propose AF-SMOTE, a mathematically motivated augmentation framework that first synthesizes minority points and then filters them by an adversarial discriminator and a boundary utility model. We prove that, under mild assumptions on the decision boundary smoothness and class-conditional densities, our filtering step monotonically improves a surrogate of F_beta (for beta >= 1) while not inflating Brier score. On MIMIC-IV proxy label prediction and canonical fraud detection benchmarks, AF-SMOTE attains higher recall and average precision than strong oversampling baselines (SMOTE, ADASYN, Borderline-SMOTE, SVM-SMOTE), and yields the best calibration. We further validate these gains across multiple additional datasets beyond MIMIC-IV. Our successful application of AF-SMOTE to a healthcare dataset using a proxy label demonstrates in a disease-agnostic way its practical value in clinical situations, where missing true positive cases in rare diseases can have severe consequences.

Paper Structure

This paper contains 7 sections, 2 theorems, 4 equations, 5 figures, 1 table.

Key Result

Theorem 1

Let $\widetilde{F}_\beta(\theta) = \frac{(1+\beta^2)\,\pi_1\,\mathbb{E}[\hat{p}\,\mathbf{1}\{\hat{p}\ge t\}\mid y{=}1]}{\beta^2\,\pi_1 + (1-\pi_1)\,\mathbb{E}[\mathbf{1}\{\hat{p}\ge t\}\mid y{=}0]}$ for $\beta\ge 1$ (normalized by class priors $\pi_1,1-\pi_1$). Under (A1)–(A5), selecting $\mathcal{S

Figures (5)

  • Figure 1: AF--SMOTE Architecture. The Backbone synthesizes minority candidates via SMOTE and evaluates them with multi-branch heads: Realism (discriminator), Boundary utility (margin/probability), Uncertainty, and Density/outlier metrics. Scores are fused with learned weights and diversity regularization; Top-$K$ selection forms an augmented training set. The ADAPT Controller handles high-dimensional cases via PCA projection and gates hyperparameters, while the Focus Loop provides feedback to concentrate synthetic samples near under-represented boundary regions.
  • Figure 2: AF--SMOTE in high dimensions. On INSPECT imaging data, a simple PCA front-end (27D$\,\to\,$15D) followed by AF--SMOTE preserves structure and yields clearly higher-quality reconstructions than SMOTE; the adaptive variant improves further. This highlights AF--SMOTE's robust extension to high-dimensional image signals with a lightweight projection.
  • Figure 3: Feature Space Visualization. Four-panel comparison showing 2D feature space distribution across augmentation methods. Panel A (Original): Severe class imbalance. Panel B (SMOTE): Linear interpolation adds synthetic samples with moderate noise, including unrealistic regions. Panel C (AF--SMOTE): Adversarial filtering retains high-quality samples with reduced noise and tighter clustering.
  • Figure 4: LightGBM. Left: AF--SMOTE improves the PR curve area; Right: AF--SMOTE shows the best calibration (closest to diagonal).
  • Figure 5: F$_\beta$ trends vs. $\beta$ (LightGBM). Using the same operating points as in Table \ref{['tab:main']}, AF--SMOTE achieves higher or comparable F$_\beta$ across $\beta\!\ge\!1$, indicating consistent gains when recall is prioritized.

Theorems & Definitions (2)

  • Theorem 1: Monotone improvement of F$_\beta$
  • Theorem 2: Brier non-increase