Table of Contents
Fetching ...

Domain Shift Analysis in Chest Radiographs Classification in a Veterans Healthcare Administration Population

Mayanka Chandrashekar, Ian Goethert, Md Inzamam Ul Haque, Benjamin McMahon, Sayera Dhaubhadel, Kathryn Knight, Joseph Erdos, Donna Reagan, Caroline Taylor, Peter Kuzmak, John Michael Gaziano, Eileen McAllister, Lauren Costa, Yuk-Lam Ho, Kelly Cho, Suzanne Tamang, Samah Fodeh-Jarad, Olga S. Ovchinnikova, Amy C. Justice, Jacob Hinkle, Ioana Danciu

TL;DR

This study quantifies domain shift in chest X-ray multilabel classification by comparing a public inpatient dataset (MIMIC-CXR) to a private outpatient VA-CXR dataset. Using DenseNet-121 pretrained on MIMIC-CXR and 14 labels derived with CheXbert, the authors assess ground-truth quality, correlate findings with ICD diagnoses, and evaluate performance across NLP tools. They show that ground-truth disagreement is generally lower in VA-CXR, but domain shift is evident across study year, demographic groups, and view positions, with Enlarged Cardiomediastinum often driving the largest cross-dataset drop. The work highlights the importance of transfer learning, high-quality annotations, and demographic-aware model development to improve generalizability and equity in chest X-ray classification. These insights are critical for deploying AI-assisted radiology tools across diverse healthcare settings.

Abstract

Objectives: This study aims to assess the impact of domain shift on chest X-ray classification accuracy and to analyze the influence of ground truth label quality and demographic factors such as age group, sex, and study year. Materials and Methods: We used a DenseNet121 model pretrained MIMIC-CXR dataset for deep learning-based multilabel classification using ground truth labels from radiology reports extracted using the CheXpert and CheXbert Labeler. We compared the performance of the 14 chest X-ray labels on the MIMIC-CXR and Veterans Healthcare Administration chest X-ray dataset (VA-CXR). The VA-CXR dataset comprises over 259k chest X-ray images spanning between the years 2010 and 2022. Results: The validation of ground truth and the assessment of multi-label classification performance across various NLP extraction tools revealed that the VA-CXR dataset exhibited lower disagreement rates than the MIMIC-CXR datasets. Additionally, there were notable differences in AUC scores between models utilizing CheXpert and CheXbert. When evaluating multi-label classification performance across different datasets, minimal domain shift was observed in unseen datasets, except for the label "Enlarged Cardiomediastinum." The study year's subgroup analyses exhibited the most significant variations in multi-label classification model performance. These findings underscore the importance of considering domain shifts in chest X-ray classification tasks, particularly concerning study years. Conclusion: Our study reveals the significant impact of domain shift and demographic factors on chest X-ray classification, emphasizing the need for improved transfer learning and equitable model development. Addressing these challenges is crucial for advancing medical imaging and enhancing patient care.

Domain Shift Analysis in Chest Radiographs Classification in a Veterans Healthcare Administration Population

TL;DR

This study quantifies domain shift in chest X-ray multilabel classification by comparing a public inpatient dataset (MIMIC-CXR) to a private outpatient VA-CXR dataset. Using DenseNet-121 pretrained on MIMIC-CXR and 14 labels derived with CheXbert, the authors assess ground-truth quality, correlate findings with ICD diagnoses, and evaluate performance across NLP tools. They show that ground-truth disagreement is generally lower in VA-CXR, but domain shift is evident across study year, demographic groups, and view positions, with Enlarged Cardiomediastinum often driving the largest cross-dataset drop. The work highlights the importance of transfer learning, high-quality annotations, and demographic-aware model development to improve generalizability and equity in chest X-ray classification. These insights are critical for deploying AI-assisted radiology tools across diverse healthcare settings.

Abstract

Objectives: This study aims to assess the impact of domain shift on chest X-ray classification accuracy and to analyze the influence of ground truth label quality and demographic factors such as age group, sex, and study year. Materials and Methods: We used a DenseNet121 model pretrained MIMIC-CXR dataset for deep learning-based multilabel classification using ground truth labels from radiology reports extracted using the CheXpert and CheXbert Labeler. We compared the performance of the 14 chest X-ray labels on the MIMIC-CXR and Veterans Healthcare Administration chest X-ray dataset (VA-CXR). The VA-CXR dataset comprises over 259k chest X-ray images spanning between the years 2010 and 2022. Results: The validation of ground truth and the assessment of multi-label classification performance across various NLP extraction tools revealed that the VA-CXR dataset exhibited lower disagreement rates than the MIMIC-CXR datasets. Additionally, there were notable differences in AUC scores between models utilizing CheXpert and CheXbert. When evaluating multi-label classification performance across different datasets, minimal domain shift was observed in unseen datasets, except for the label "Enlarged Cardiomediastinum." The study year's subgroup analyses exhibited the most significant variations in multi-label classification model performance. These findings underscore the importance of considering domain shifts in chest X-ray classification tasks, particularly concerning study years. Conclusion: Our study reveals the significant impact of domain shift and demographic factors on chest X-ray classification, emphasizing the need for improved transfer learning and equitable model development. Addressing these challenges is crucial for advancing medical imaging and enhancing patient care.
Paper Structure (24 sections, 7 figures, 7 tables)

This paper contains 24 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Domain Shift Analysis based on Image Classification Model
  • Figure 2: Characterization of the concurrence of positive findings on chest X-rays with associated diagnoses for our VA data set. The top panel shows the sensitivity with which a related diagnosis is given within a week before or after a positive x-ray finding, which we labeled 'Sensitivity'. The bottom panel shows the factor by which the ratio of positive to negative X-ray findings increases when a diagnosis code is present. Patients with a diagnosis code not within one week of the assessed X-ray are excluded from the calculation for both plots. Bert- Chex-Bert Model ; Pert- Chex-Pert Model
  • Figure 3: Accuracy and Prevalence of Labels Study Year Wise *Blue line indicates the AUC across years. Orange Line indicates Prevalence across years
  • Figure 4: Comparison of Label-wise AUC and Prevalence across the Sexes for MIMIC-CXR and VA-CXR. Orange represents MIMIC-CXR and Green represents VA-CXR
  • Figure 5: Comparison of AUC and Prevalence across View Points
  • ...and 2 more figures