Table of Contents
Fetching ...

Towards Fair and Robust Volumetric CT Classification via KL-Regularised Group Distributionally Robust Optimisation

Samuel Johnny, Blessed Guda, Frank Ebeledike, Goodness Obasi, Moise Busogi

Abstract

Automated diagnosis from chest computed tomography (CT) scans faces two persistent challenges in clinical deployment: distribution shift across acquisition sites and performance disparity across demographic subgroups. We address both simultaneously across two complementary tasks: binary COVID-19 classification from multi-site CT volumes (Task 1) and four-class lung pathology recognition with gender-based fairness constraints (Task 2). Our framework combines a lightweight MobileViT-XXS slice encoder with a two-layer SliceTransformer aggregator for volumetric reasoning, and trains with a KL-regularised Group Distributionally Robust Optimisation (Group DRO) objective that adaptively upweights underperforming acquisition centres and demographic subgroups. Unlike standard Group DRO, the KL penalty prevents group weight collapse, providing a stable balance between worst-case protection and average performance. For Task 2, we define groups at the granularity of gender class, directly targeting severely underrepresented combinations such as female Squamous cell carcinoma. On Task 1, our best configuration achieves a challenge F1 of 0.835, surpassing the best published challenge entry by +5.9. On Task 2, Group DRO with α = 0.5 achieves a mean per-gender macro F1 of 0.815, outperforming the best challenge entry by +11.1 pp and improving Female Squamous F1 by +17.4 over the Fo- cal Loss baseline.

Towards Fair and Robust Volumetric CT Classification via KL-Regularised Group Distributionally Robust Optimisation

Abstract

Automated diagnosis from chest computed tomography (CT) scans faces two persistent challenges in clinical deployment: distribution shift across acquisition sites and performance disparity across demographic subgroups. We address both simultaneously across two complementary tasks: binary COVID-19 classification from multi-site CT volumes (Task 1) and four-class lung pathology recognition with gender-based fairness constraints (Task 2). Our framework combines a lightweight MobileViT-XXS slice encoder with a two-layer SliceTransformer aggregator for volumetric reasoning, and trains with a KL-regularised Group Distributionally Robust Optimisation (Group DRO) objective that adaptively upweights underperforming acquisition centres and demographic subgroups. Unlike standard Group DRO, the KL penalty prevents group weight collapse, providing a stable balance between worst-case protection and average performance. For Task 2, we define groups at the granularity of gender class, directly targeting severely underrepresented combinations such as female Squamous cell carcinoma. On Task 1, our best configuration achieves a challenge F1 of 0.835, surpassing the best published challenge entry by +5.9. On Task 2, Group DRO with α = 0.5 achieves a mean per-gender macro F1 of 0.815, outperforming the best challenge entry by +11.1 pp and improving Female Squamous F1 by +17.4 over the Fo- cal Loss baseline.
Paper Structure (30 sections, 5 equations, 3 figures, 2 tables)

This paper contains 30 sections, 5 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of the proposed pipeline. A 3D CT scan is split into 64 slices, each encoded independently by a shared MobileViT backbone. A learned SliceAggregator pools slice features via attention weighting. The Classification Head produces predictions per sample, and Group DRO with KL regularization dynamically reweights per-centre losses during training.
  • Figure 2: Effect of KL regularisation strength $\alpha$ on Task 2 validation performance, reported separately for male and female subgroups. Group DRO with $\alpha\!=\!0.5$ achieves the best mean F1 of 0.815 and the smallest gender gap, outperforming Focal Loss (0.777) and the best challenge entry (0.704 kollias2025pharos). At $\alpha\!=\!1.0$, male macro rises while female macro falls sharply, indicating that forcing uniform weights over-regularises the minority gender subgroup.
  • Figure 3: Effect of KL regularisation strength $\alpha$ on Task 1 validation performance. Group DRO with $\alpha\!=\!0.5$ achieves the best mean F1 of 0.835, surpassing both the weighted CE baseline (0.804) and the best published challenge entry (0.776 kollias2025pharos). Large $\alpha$ forces uniform group weights, collapsing toward ERM and degrading performance ($\alpha\!=\!0.3$, F1 = 0.726).