Table of Contents
Fetching ...

Fair Lung Disease Diagnosis from Chest CT via Gender-Adversarial Attention Multiple Instance Learning

Aditya Parikh, Aasa Feragen

Abstract

We present a fairness-aware framework for multi-class lung disease diagnosis from chest CT volumes, developed for the Fair Disease Diagnosis Challenge at the PHAROS-AIF-MIH Workshop (CVPR 2026). The challenge requires classifying CT scans into four categories -- Healthy, COVID-19, Adenocarcinoma, and Squamous Cell Carcinoma -- with performance measured as the average of per-gender macro F1 scores, explicitly penalizing gender-inequitable predictions. Our approach addresses two core difficulties: the sparse pathological signal across hundreds of slices, and a severe demographic imbalance compounded across disease class and gender. We propose an attention-based Multiple Instance Learning (MIL) model on a ConvNeXt backbone that learns to identify diagnostically relevant slices without slice-level supervision, augmented with a Gradient Reversal Layer (GRL) that adversarially suppresses gender-predictive structure in the learned scan representation. Training incorporates focal loss with label smoothing, stratified cross-validation over joint (class, gender) strata, and targeted oversampling of the most underrepresented subgroup. At inference, all five-fold checkpoints are ensembled with horizontal-flip test-time augmentation via soft logit voting and out-of-the-fold threshold optimization for robustness. Our model achieves a mean validation competition score of 0.685 (std - 0.030), with the best single fold reaching 0.759. All training and inference code is publicly available at https://github.com/ADE-17/cvpr-fair-chest-ct

Fair Lung Disease Diagnosis from Chest CT via Gender-Adversarial Attention Multiple Instance Learning

Abstract

We present a fairness-aware framework for multi-class lung disease diagnosis from chest CT volumes, developed for the Fair Disease Diagnosis Challenge at the PHAROS-AIF-MIH Workshop (CVPR 2026). The challenge requires classifying CT scans into four categories -- Healthy, COVID-19, Adenocarcinoma, and Squamous Cell Carcinoma -- with performance measured as the average of per-gender macro F1 scores, explicitly penalizing gender-inequitable predictions. Our approach addresses two core difficulties: the sparse pathological signal across hundreds of slices, and a severe demographic imbalance compounded across disease class and gender. We propose an attention-based Multiple Instance Learning (MIL) model on a ConvNeXt backbone that learns to identify diagnostically relevant slices without slice-level supervision, augmented with a Gradient Reversal Layer (GRL) that adversarially suppresses gender-predictive structure in the learned scan representation. Training incorporates focal loss with label smoothing, stratified cross-validation over joint (class, gender) strata, and targeted oversampling of the most underrepresented subgroup. At inference, all five-fold checkpoints are ensembled with horizontal-flip test-time augmentation via soft logit voting and out-of-the-fold threshold optimization for robustness. Our model achieves a mean validation competition score of 0.685 (std - 0.030), with the best single fold reaching 0.759. All training and inference code is publicly available at https://github.com/ADE-17/cvpr-fair-chest-ct
Paper Structure (31 sections, 10 equations, 3 figures, 3 tables)

This paper contains 31 sections, 10 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Representative axial CT slices for each diagnostic category. (a) Healthy lungs exhibit clear parenchyma. (b) COVID-19 infection presents with characteristic bilateral ground-glass opacities distributed across the lung fields. (c) Adenocarcinoma and (d) Squamous Cell Carcinoma represent the two malignant categories, each presenting as distinct focal lung lesions. The subtle visual differences between disease classes, and the fact that abnormalities occupy only a small fraction of slices within a full CT volume, motivate our attention-based MIL approach.
  • Figure 2: Dataset characteristics. (top) Distribution of scans by class and gender, highlighting the severe intersectional scarcity of female Squamous Cell Carcinoma (SCC) cases. (bottom) Variance in volumetric depth across classes. The extreme fluctuation in slices per scan (ranging from under 20 to over 800) necessitates our flexible Attention-MIL formulation.
  • Figure 3: Weights & Biases validation curves across all 5 folds. The integration of the Gradient Reversal Layer stabilized the initially highly variant training dynamics.