BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

Haoyue Sheng; Linrui Ma; Jean-Francois Samson; Dianbo Liu

BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

Haoyue Sheng, Linrui Ma, Jean-Francois Samson, Dianbo Liu

TL;DR

The paper tackles cross-domain domain inconsistency in chest X-ray abnormality localization by introducing BarlowTwins-CXR, a two-phase training strategy that first performs self-supervised pretraining on the NIH-CXR dataset and then fine-tunes on VinDr-CXR using Faster R-CNN with FPN. The approach yields a notable improvement in localization performance, achieving about a $mAP_{50}$ increase of roughly 3 percentage points over ImageNet pretraining and stronger heatmaps that better align with ground-truth lesions, especially in low-data regimes as shown by linear evaluation $AUC$ results. The findings demonstrate that self-supervised pretraining on domain-relevant unlabeled medical images enhances generalizability across heterogeneous CXR data and can reduce radiologist workload by improving automated localization. This method offers practical implications for deploying robust CXR analysis tools in diverse clinical settings with limited labeled data, while acknowledging computational costs and the need for larger bounding-box datasets for broader generalization.

Abstract

Background: Chest X-ray imaging-based abnormality localization, essential in diagnosing various diseases, faces significant clinical challenges due to complex interpretations and the growing workload of radiologists. While recent advances in deep learning offer promising solutions, there is still a critical issue of domain inconsistency in cross-domain transfer learning, which hampers the efficiency and accuracy of diagnostic processes. This study aims to address the domain inconsistency problem and improve autonomic abnormality localization performance of heterogeneous chest X-ray image analysis, by developing a self-supervised learning strategy called "BarlwoTwins-CXR". Methods: We utilized two publicly available datasets: the NIH Chest X-ray Dataset and the VinDr-CXR. The BarlowTwins-CXR approach was conducted in a two-stage training process. Initially, self-supervised pre-training was performed using an adjusted Barlow Twins algorithm on the NIH dataset with a Resnet50 backbone pre-trained on ImageNet. This was followed by supervised fine-tuning on the VinDr-CXR dataset using Faster R-CNN with Feature Pyramid Network (FPN). Results: Our experiments showed a significant improvement in model performance with BarlowTwins-CXR. The approach achieved a 3% increase in mAP50 accuracy compared to traditional ImageNet pre-trained models. In addition, the Ablation CAM method revealed enhanced precision in localizing chest abnormalities. Conclusion: BarlowTwins-CXR significantly enhances the efficiency and accuracy of chest X-ray image-based abnormality localization, outperforming traditional transfer learning methods and effectively overcoming domain inconsistency in cross-domain scenarios. Our experiment results demonstrate the potential of using self-supervised learning to improve the generalizability of models in medical settings with limited amounts of heterogeneous data.

BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

TL;DR

increase of roughly 3 percentage points over ImageNet pretraining and stronger heatmaps that better align with ground-truth lesions, especially in low-data regimes as shown by linear evaluation

results. The findings demonstrate that self-supervised pretraining on domain-relevant unlabeled medical images enhances generalizability across heterogeneous CXR data and can reduce radiologist workload by improving automated localization. This method offers practical implications for deploying robust CXR analysis tools in diverse clinical settings with limited labeled data, while acknowledging computational costs and the need for larger bounding-box datasets for broader generalization.

Abstract

Paper Structure (24 sections, 7 figures, 3 tables)

This paper contains 24 sections, 7 figures, 3 tables.

Introduction
Related Work
Methods
Dataset Selection
Dual-Phase Training Process
Self-Supervised Pre-training
Fine-tuning Phase
Results Analysis Process
Results
Transfer Learning on VinDr Abnormality Localization
Linear Evaluation Protocol
End-to-End Finetuning
Discussion
Future Work
Conclusions
...and 9 more sections

Figures (7)

Figure 1: Image-level label distribution of the NIH-CXR dataset.
Figure 2: Instance-level annotation distribution of VinDr-CXR dataset before(a) and after(b) WBF preprocessing.
Figure 3: Schematic Overview of the Dual-phase Training Framework. The upper panel illustrates the Barlow Twins method in Phase One, where pairs of distorted images are processed through a shared ResNet50 network to produce embeddings. These are then compared using an empirical cross-correlation matrix C, striving for the identity matrix I to minimize redundancy in feature dimensions, and optimizing the loss function $L_{BT}$. In Phase Two (lower panel), the pre-trained ResNet50 backbone from Phase One is integrated into a Faster R-CNN architecture. It starts with multi-scale feature extraction through the Feature Pyramid Network (FPN), followed by the Region Proposal Network (RPN) that generates object region proposals. The features are then pooled and processed by fully connected (FC) layers to output the final class labels and bounding box coordinates for object detection tasks.
Figure 4: Evolution of mAP50 across epochs for different ResNet50 backbones on the VinDr-CXR dataset at 224*224(left) and 640*640(right) resolution. The darker lines represent the average mAP50 of four(left) and five(right) trials with different random seeds, with shaded areas indicating the range between the lowest and highest value.
Figure 5: Heatmaps were generated from the initial images of the training set(left) and test set(right), indicating successful Bbox predictions by the BarlowTwins-CXR model. Each heatmap corresponds to one accurately predicted bbox, despite multiple bboxes present in each CXR image. Serial numbers below the heatmaps refer to the image numbers in the dataset.
...and 2 more figures

BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

TL;DR

Abstract

BarlowTwins-CXR : Enhancing Chest X-Ray abnormality localization in heterogeneous data with cross-domain self-supervised learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)