Table of Contents
Fetching ...

Learning to Generalize towards Unseen Domains via a Content-Aware Style Invariant Model for Disease Detection from Chest X-rays

Mohammad Zunaed, Md. Aynal Haque, Taufiq Hasan

TL;DR

This work tackles domain shift in chest X-ray disease detection by identifying CNN texture biases toward style and proposing a content-biased, style-invariant domain-generalization framework. It introduces two on-the-fly style randomization modules: SRM-IL at image level, sampling from the pixel value range, and SRM-FL at feature level with learnable per-pixel embeddings, coupled with consistency regularizations on semantic features and predictive distributions. Trained on CheXpert and MIMIC-CXR and evaluated on BRAX, VinDr-CXR, and NIH Chest X-ray14, the method achieves state-of-the-art unseen-domain AUCs with statistically significant improvements, demonstrating strong cross-domain robustness. While incurring higher training cost due to dual-input regularizations, inference remains efficient, and the approach offers a practical path toward reliable thoracic disease detection across diverse clinical settings. Future work may explore anatomy-aware features, patch-based statistics, and integration with existing radiomics for further content regularization.

Abstract

Performance degradation due to distribution discrepancy is a longstanding challenge in intelligent imaging, particularly for chest X-rays (CXRs). Recent studies have demonstrated that CNNs are biased toward styles (e.g., uninformative textures) rather than content (e.g., shape), in stark contrast to the human vision system. Radiologists tend to learn visual cues from CXRs and thus perform well across multiple domains. Motivated by this, we employ the novel on-the-fly style randomization modules at both image (SRM-IL) and feature (SRM-FL) levels to create rich style perturbed features while keeping the content intact for robust cross-domain performance. Previous methods simulate unseen domains by constructing new styles via interpolation or swapping styles from existing data, limiting them to available source domains during training. However, SRM-IL samples the style statistics from the possible value range of a CXR image instead of the training data to achieve more diversified augmentations. Moreover, we utilize pixel-wise learnable parameters in the SRM-FL compared to pre-defined channel-wise mean and standard deviations as style embeddings for capturing more representative style features. Additionally, we leverage consistency regularizations on global semantic features and predictive distributions from with and without style-perturbed versions of the same CXR to tweak the model's sensitivity toward content markers for accurate predictions. Our proposed method, trained on CheXpert and MIMIC-CXR datasets, achieves 77.32$\pm$0.35, 88.38$\pm$0.19, 82.63$\pm$0.13 AUCs(%) on the unseen domain test datasets, i.e., BRAX, VinDr-CXR, and NIH chest X-ray14, respectively, compared to 75.56$\pm$0.80, 87.57$\pm$0.46, 82.07$\pm$0.19 from state-of-the-art models on five-fold cross-validation with statistically significant results in thoracic disease classification.

Learning to Generalize towards Unseen Domains via a Content-Aware Style Invariant Model for Disease Detection from Chest X-rays

TL;DR

This work tackles domain shift in chest X-ray disease detection by identifying CNN texture biases toward style and proposing a content-biased, style-invariant domain-generalization framework. It introduces two on-the-fly style randomization modules: SRM-IL at image level, sampling from the pixel value range, and SRM-FL at feature level with learnable per-pixel embeddings, coupled with consistency regularizations on semantic features and predictive distributions. Trained on CheXpert and MIMIC-CXR and evaluated on BRAX, VinDr-CXR, and NIH Chest X-ray14, the method achieves state-of-the-art unseen-domain AUCs with statistically significant improvements, demonstrating strong cross-domain robustness. While incurring higher training cost due to dual-input regularizations, inference remains efficient, and the approach offers a practical path toward reliable thoracic disease detection across diverse clinical settings. Future work may explore anatomy-aware features, patch-based statistics, and integration with existing radiomics for further content regularization.

Abstract

Performance degradation due to distribution discrepancy is a longstanding challenge in intelligent imaging, particularly for chest X-rays (CXRs). Recent studies have demonstrated that CNNs are biased toward styles (e.g., uninformative textures) rather than content (e.g., shape), in stark contrast to the human vision system. Radiologists tend to learn visual cues from CXRs and thus perform well across multiple domains. Motivated by this, we employ the novel on-the-fly style randomization modules at both image (SRM-IL) and feature (SRM-FL) levels to create rich style perturbed features while keeping the content intact for robust cross-domain performance. Previous methods simulate unseen domains by constructing new styles via interpolation or swapping styles from existing data, limiting them to available source domains during training. However, SRM-IL samples the style statistics from the possible value range of a CXR image instead of the training data to achieve more diversified augmentations. Moreover, we utilize pixel-wise learnable parameters in the SRM-FL compared to pre-defined channel-wise mean and standard deviations as style embeddings for capturing more representative style features. Additionally, we leverage consistency regularizations on global semantic features and predictive distributions from with and without style-perturbed versions of the same CXR to tweak the model's sensitivity toward content markers for accurate predictions. Our proposed method, trained on CheXpert and MIMIC-CXR datasets, achieves 77.320.35, 88.380.19, 82.630.13 AUCs(%) on the unseen domain test datasets, i.e., BRAX, VinDr-CXR, and NIH chest X-ray14, respectively, compared to 75.560.80, 87.570.46, 82.070.19 from state-of-the-art models on five-fold cross-validation with statistically significant results in thoracic disease classification.
Paper Structure (20 sections, 12 equations, 4 figures, 8 tables)

This paper contains 20 sections, 12 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: (a) Illustration of image-level distribution gap of three thoracic disease datasets using image-level style embeddings, i.e., mean and standard deviation. (b) 2D t-SNE visualization of style statistics (concatenation of means and standard deviations) computed from the feature maps from the first dense block of the DenseNet-121, trained on the thoracic datasets. We can observe that the feature statistics capture dataset-specific styles reflected by the separable clusters.
  • Figure 2: Overview of the proposed framework. The style statistics of the input CXR are randomized with the randomly sampled mean and standard deviation from the set of values constructed from the prior knowledge of possible maximum and minimum values. Both CXRs, with or without style randomized, are passed to the shared feature extractor, DenseIBN-121, to generate two global feature spaces. For the stylized CXR, another round of style randomization is applied to the feature space after dense block-2, with a feature space for a randomly selected stylized CXR from the training mini-batch. The feature-level style randomization block has learnable parameters and is trained alongside the backbone model. While the image-level style randomization block has no learnable parameters, it achieves diversity by sampling different style statistics per CXR image in each iteration for every epoch. The content consistency loss is employed between the two global feature spaces to increase the model's bias toward disease-specific content. In addition, a Kullback-Leibler divergence-based regularization loss is applied between the predicted probability distributions for CXRs with and without style statistics perturbed. The global feature space from the stylized CXR is pooled and passed to the classifier for pathology prediction.
  • Figure 3: Illustration of the sensitivity of the hyperparameter $\eta$ on (a) BRAX, (b) NIH chest X-ray14, and (c) VinDr-CXR datasets.
  • Figure 4: Impact of the hyperparameter $\eta$ on (a) Lung Lesion and (b) Cardiomegaly pathologies (BRAX dataset).