Table of Contents
Fetching ...

A Heterogeneous Ensemble for Multi-Center COVID-19 Classification from Chest CT Scans

Aadit Nilay, Bhavesh Thapar, Anant Agrawal, Mohammad Nayeem Teli

Abstract

The COVID-19 pandemic exposed critical limitations in diagnostic workflows: RT-PCR tests suffer from slow turnaround times and high false-negative rates, while CT-based screening offers faster complementary diagnosis but requires expert radiological interpretation. Deploying automated CT analysis across multiple hospital centres introduces further challenges, as differences in scanner hardware, acquisition protocols, and patient populations cause substantial domain shift that degrades single-model performance. To address these challenges, we present a heterogeneous ensemble of nine models spanning three inference paradigms: (1) a self-supervised DINOv2 Vision Transformer with slice-level sigmoid aggregation, (2) a RadImageNet-pretrained DenseNet-121 with slice-level sigmoid averaging, and (3) seven Gated Attention Multiple Instance Learning models using EfficientNet-B3, ConvNeXt-Tiny, and EfficientNetV2-S backbones with scan-level softmax classification. Ensemble diversity is further enhanced through random-seed variation and Stochastic Weight Averaging. We address severe overfitting, reducing the validation-to-training loss ratio from 35x to less than 3x, through a combination of Focal Loss, embedding-level Mixup, and domain-aware augmentation. Model outputs are fused via score-weighted probability averaging and calibrated with per-source threshold optimization. The final ensemble achieves an average macro F1 of 0.9280 across four hospital centres, outperforming the best single model (F1=0.8969) by +0.031, demonstrating that heterogeneous architectures combined with source-aware calibration are essential for robust multi-site medical image classification.

A Heterogeneous Ensemble for Multi-Center COVID-19 Classification from Chest CT Scans

Abstract

The COVID-19 pandemic exposed critical limitations in diagnostic workflows: RT-PCR tests suffer from slow turnaround times and high false-negative rates, while CT-based screening offers faster complementary diagnosis but requires expert radiological interpretation. Deploying automated CT analysis across multiple hospital centres introduces further challenges, as differences in scanner hardware, acquisition protocols, and patient populations cause substantial domain shift that degrades single-model performance. To address these challenges, we present a heterogeneous ensemble of nine models spanning three inference paradigms: (1) a self-supervised DINOv2 Vision Transformer with slice-level sigmoid aggregation, (2) a RadImageNet-pretrained DenseNet-121 with slice-level sigmoid averaging, and (3) seven Gated Attention Multiple Instance Learning models using EfficientNet-B3, ConvNeXt-Tiny, and EfficientNetV2-S backbones with scan-level softmax classification. Ensemble diversity is further enhanced through random-seed variation and Stochastic Weight Averaging. We address severe overfitting, reducing the validation-to-training loss ratio from 35x to less than 3x, through a combination of Focal Loss, embedding-level Mixup, and domain-aware augmentation. Model outputs are fused via score-weighted probability averaging and calibrated with per-source threshold optimization. The final ensemble achieves an average macro F1 of 0.9280 across four hospital centres, outperforming the best single model (F1=0.8969) by +0.031, demonstrating that heterogeneous architectures combined with source-aware calibration are essential for robust multi-site medical image classification.
Paper Structure (23 sections, 11 equations, 5 figures, 2 tables)

This paper contains 23 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Representative COVID-positive CT slices from three different data centers in the validation set, each showing ground-glass opacities and multi-focal involvement with differing scanner characteristics and acquisition protocols.
  • Figure 2: Ensemble approach vs. baseline ResNet50 on a similar dataset.
  • Figure 3: Phase-by-phase F1 progression of different models
  • Figure 4: COVID-positive CT slices with subtle ground-glass opacities (GGO). These faint, poorly-contrasted regions are easily overlooked during rapid human screening but are consistently detected by the attention mechanism across multiple slices.
  • Figure 5: Non-COVID CT slices from Centre 1 that the model tends to misclassify as COVID-positive. Scanner-specific artifacts, contrast variations, and protocol differences create patterns that confuse the model but are easily recognized as non-pathological by trained radiologists.