Table of Contents
Fetching ...

High-confidence pseudo-labels for domain adaptation in COVID-19 detection

Robert Turnbull, Simon Mutch

TL;DR

The paper tackles COVID-19 detection from CT scans under domain shift by leveraging a lung-focused preprocessing pipeline and two 3D architectures (ResNet and Swin Transformer). It combines cross-validated training on a large labeled set with a pseudo-labeling strategy to exploit unlabeled data for the domain adaptation challenge. The approach achieves high mean F1 scores on both tasks, with notable gains from high-confidence pseudo-labels (Challenge 2) and strong ensemble performance (Challenge 1). This work demonstrates that accurate COVID-19 classification across distributions is achievable with modest labeled data and targeted label augmentation, supporting practical deployment in medical imaging settings.

Abstract

This paper outlines our submission for the 4th COV19D competition as part of the `Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis' (DEF-AI-MIA) workshop at the Computer Vision and Pattern Recognition Conference (CVPR). The competition consists of two challenges. The first is to train a classifier to detect the presence of COVID-19 from over one thousand CT scans from the COV19-CT-DB database. The second challenge is to perform domain adaptation by taking the dataset from Challenge 1 and adding a small number of scans (some annotated and other not) for a different distribution. We preprocessed the CT scans to segment the lungs, and output volumes with the lungs individually and together. We then trained 3D ResNet and Swin Transformer models on these inputs. We annotated the unlabeled CT scans using an ensemble of these models and chose the high-confidence predictions as pseudo-labels for fine-tuning. This resulted in a best cross-validation mean F1 score of 93.39\% for Challenge 1 and a mean F1 score of 92.15 for Challenge 2.

High-confidence pseudo-labels for domain adaptation in COVID-19 detection

TL;DR

The paper tackles COVID-19 detection from CT scans under domain shift by leveraging a lung-focused preprocessing pipeline and two 3D architectures (ResNet and Swin Transformer). It combines cross-validated training on a large labeled set with a pseudo-labeling strategy to exploit unlabeled data for the domain adaptation challenge. The approach achieves high mean F1 scores on both tasks, with notable gains from high-confidence pseudo-labels (Challenge 2) and strong ensemble performance (Challenge 1). This work demonstrates that accurate COVID-19 classification across distributions is achievable with modest labeled data and targeted label augmentation, supporting practical deployment in medical imaging settings.

Abstract

This paper outlines our submission for the 4th COV19D competition as part of the `Domain adaptation, Explainability, Fairness in AI for Medical Image Analysis' (DEF-AI-MIA) workshop at the Computer Vision and Pattern Recognition Conference (CVPR). The competition consists of two challenges. The first is to train a classifier to detect the presence of COVID-19 from over one thousand CT scans from the COV19-CT-DB database. The second challenge is to perform domain adaptation by taking the dataset from Challenge 1 and adding a small number of scans (some annotated and other not) for a different distribution. We preprocessed the CT scans to segment the lungs, and output volumes with the lungs individually and together. We then trained 3D ResNet and Swin Transformer models on these inputs. We annotated the unlabeled CT scans using an ensemble of these models and chose the high-confidence predictions as pseudo-labels for fine-tuning. This resulted in a best cross-validation mean F1 score of 93.39\% for Challenge 1 and a mean F1 score of 92.15 for Challenge 2.
Paper Structure (12 sections, 3 figures, 2 tables)

This paper contains 12 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The cross-validation results for challenge 1. Models joined with a '+' are ensembles with prediction probabilities averaged.
  • Figure 2: The cross-validation results for challenge 2 before adding in pseudo-labels.
  • Figure 3: The cross-validation results for challenge 2 after adding in pseudo-labels.