Table of Contents
Fetching ...

Self-Supervised Learning for Building Robust Pediatric Chest X-ray Classification Models

Sheng Cheng, Zbigniew A. Starosolski, Devika Subramanian

TL;DR

This work proposes SCC, a novel approach that combines transfer learning with self-supervised contrastive learning, augmented by an unsupervised contrast enhancement technique, which matches the performance of regular transfer learning trained on the entire labeled dataset.

Abstract

Recent advancements in deep learning for Medical Artificial Intelligence have demonstrated that models can match the diagnostic performance of clinical experts in adult chest X-ray (CXR) interpretation. However, their application in the pediatric context remains limited due to the scarcity of large annotated pediatric image datasets. Additionally, significant challenges arise from the substantial variability in pediatric CXR images across different hospitals and the diverse age range of patients from 0 to 18 years. To address these challenges, we propose SCC, a novel approach that combines transfer learning with self-supervised contrastive learning, augmented by an unsupervised contrast enhancement technique. Transfer learning from a well-trained adult CXR model mitigates issues related to the scarcity of pediatric training data. Contrastive learning with contrast enhancement focuses on the lungs, reducing the impact of image variations and producing high-quality embeddings across diverse pediatric CXR images. We train SCC on one pediatric CXR dataset and evaluate its performance on two other pediatric datasets from different sources. Our results show that SCC's out-of-distribution (zero-shot) performance exceeds regular transfer learning in terms of AUC by 13.6% and 34.6% on the two test datasets. Moreover, with few-shot learning using 10 times fewer labeled images, SCC matches the performance of regular transfer learning trained on the entire labeled dataset. To test the generality of the framework, we verify its performance on three benchmark breast cancer datasets. Starting from a model trained on natural images and fine-tuned on one breast dataset, SCC outperforms the fully supervised learning baseline on the other two datasets in terms of AUC by 3.6% and 5.5% in zero-shot learning.

Self-Supervised Learning for Building Robust Pediatric Chest X-ray Classification Models

TL;DR

This work proposes SCC, a novel approach that combines transfer learning with self-supervised contrastive learning, augmented by an unsupervised contrast enhancement technique, which matches the performance of regular transfer learning trained on the entire labeled dataset.

Abstract

Recent advancements in deep learning for Medical Artificial Intelligence have demonstrated that models can match the diagnostic performance of clinical experts in adult chest X-ray (CXR) interpretation. However, their application in the pediatric context remains limited due to the scarcity of large annotated pediatric image datasets. Additionally, significant challenges arise from the substantial variability in pediatric CXR images across different hospitals and the diverse age range of patients from 0 to 18 years. To address these challenges, we propose SCC, a novel approach that combines transfer learning with self-supervised contrastive learning, augmented by an unsupervised contrast enhancement technique. Transfer learning from a well-trained adult CXR model mitigates issues related to the scarcity of pediatric training data. Contrastive learning with contrast enhancement focuses on the lungs, reducing the impact of image variations and producing high-quality embeddings across diverse pediatric CXR images. We train SCC on one pediatric CXR dataset and evaluate its performance on two other pediatric datasets from different sources. Our results show that SCC's out-of-distribution (zero-shot) performance exceeds regular transfer learning in terms of AUC by 13.6% and 34.6% on the two test datasets. Moreover, with few-shot learning using 10 times fewer labeled images, SCC matches the performance of regular transfer learning trained on the entire labeled dataset. To test the generality of the framework, we verify its performance on three benchmark breast cancer datasets. Starting from a model trained on natural images and fine-tuned on one breast dataset, SCC outperforms the fully supervised learning baseline on the other two datasets in terms of AUC by 3.6% and 5.5% in zero-shot learning.
Paper Structure (18 sections, 2 equations, 6 figures, 4 tables)

This paper contains 18 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Dataset description. (a) Image examples. CXR images from different sources exhibit different attributes, implying the aforementioned two domain gaps: AP and PP domain gap. (b) Age distribution of P1 and P2 datasets. While P2 mainly comprises pediatrics under 2 years old, P1 has a relatively more uniform age distribution over 0-12 years old. (c)Domain gaps among pediatric and adult datasets. X and Y axes are the embedding spaces after we applied multidimensional scaling on the distribution distance matrix. When P1 and p3 are relatively closer to each other, the rest of the datasets all have a huge domain gap with others. It's worth noting that for P1, the distance to P2 is even further than the distance to A1, which emphasizes that we should consider not only the AP domain gap but also the PP domain gap.
  • Figure 2: Architectures of the framework SCC (a) represents the traditional transfer learning process, which directly retrains the pre-trained adult model on pediatric images. This approach can lead to model overfitting to hospital or population-specific biases, resulting in poor generalization ability and unsuitability for clinical settings. (b) illustrates the proposed SCC framework, which integrates two self-supervised approaches: (c) making images more similar and (d) adapting the feature encoder to pediatric domains to overcome the AP and PP domain gaps. Consequently, SCC can build generalizable pediatric models with high OOD performance.
  • Figure 3: Overview of the DCE. (a) The first stage: Text removal. (b) Lung enhancement curve. When operating on a pixel-wise basis, this plot shows that a greater learnable parameter $\alpha$ can map the original pixels to a wider dynamic range, while a smaller $\alpha$ offers a narrower range. Therefore, with a suitable $\alpha$, the details of the lung area can be highlighted, and other regions can be suppressed, generating a clearer version of the images for the subsequent classification model. (c) DCE architecture and examples. Comparing the images and the pixel histograms, pixels of the original images are concentrated in the middle range, which blurs the lesion pixels with surrounding objects like ribs or the lung background. Conversely, the enhanced image has a more uniform pixel intensity distribution, which helps highlight the lesion pixels in the lung area and suppresses other regions like the abdomen.
  • Figure 4: OOD performances. Overview of the OOD performances, demonstrating the high generalization ability of SCC alongside strong baselines. Figures (a) and (c) depict the zero-shot performance, indicating SCC's superior OOD classification performance even without access to retraining data in a new clinical setting. Figures (b) and (d) display the few-shot learning performance with varying training ratios of the P2 and P3 datasets, respectively. The best transfer learning performance of the supervised baseline is matched by SCC with access to less than 10% of the labeled images of P2 and P3 datasets, which indicates that our proposed framework can achieve comparable accuracy as baseline specialized models using 10 times less labeled data.
  • Figure 5: Localization performance. Figure (a) shows an positive example and the corresponding attention maps. The red bounding box is the lesion part, drawn by an expert radiologist. Notably, both Xrv and SimCLR appear to be influenced by text and noise, hindering their generalization ability. Conversely, DCE and SCC yields more precise attention maps, concentrating inside the lung area. Figure (b) is the quantification scores of hit ratehitrate. The positive images of the P1 dataset contains three categories: typical appearance (TA), Indeterminate Appearance (IA), and Atypical Appearance (AA). "All" means the average scores of all the images. Both DCE and SCC exhibit significant improvement compared with the strong supervised baseline, which suggests the robustness and high generalization ability of our proposed framework. Though SimCLR achieves relateively higher scores on the TA type, it shows lower scores for other types, potentially due to inherited biases from the pretrained adult model.
  • ...and 1 more figures