Table of Contents
Fetching ...

Efficient Chest X-ray Representation Learning via Semantic-Partitioned Contrastive Learning

Wangyu Feng, Shawn Young, Lijian Xu

TL;DR

Semantic-Partitioned Contrastive Learning (S-PCL) is introduced, an efficient pre-training framework tailored for CXR representation learning that achieves competitive performance while attaining the lowest GFLOPs and superior accuracy among existing SSL approaches.

Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for Chest X-ray (CXR) analysis under limited annotations. Yet, existing SSL strategies remain suboptimal for medical imaging. Masked image modeling allocates substantial computation to reconstructing high-frequency background details with limited diagnostic value. Contrastive learning, on the other hand, often depends on aggressive augmentations that risk altering clinically meaningful structures. We introduce Semantic-Partitioned Contrastive Learning (S-PCL), an efficient pre-training framework tailored for CXR representation learning. Instead of reconstructing pixels or relying on heavy augmentations, S-PCL randomly partitions patch tokens from a single CXR into two non-overlapping semantic subsets. Each subset provides a complementary but incomplete view. The encoder must maximize agreement between these partitions, implicitly inferring global anatomical layout and local pathological cues from partial evidence. This semantic partitioning forms an internal bottleneck that enforces long-range dependency modeling and structural coherence. S-PCL eliminates the need for hand-crafted augmentations, auxiliary decoders, and momentum encoders. The resulting architecture is streamlined, computationally efficient, and easy to scale. Extensive experiments on large-scale CXR benchmarks, including ChestX-ray14, CheXpert, RSNA Pneumonia and SIIM-ACR Pneumothorax, show that S-PCL achieves competitive performance while attaining the lowest GFLOPs and superior accuracy among existing SSL approaches.

Efficient Chest X-ray Representation Learning via Semantic-Partitioned Contrastive Learning

TL;DR

Semantic-Partitioned Contrastive Learning (S-PCL) is introduced, an efficient pre-training framework tailored for CXR representation learning that achieves competitive performance while attaining the lowest GFLOPs and superior accuracy among existing SSL approaches.

Abstract

Self-supervised learning (SSL) has emerged as a powerful paradigm for Chest X-ray (CXR) analysis under limited annotations. Yet, existing SSL strategies remain suboptimal for medical imaging. Masked image modeling allocates substantial computation to reconstructing high-frequency background details with limited diagnostic value. Contrastive learning, on the other hand, often depends on aggressive augmentations that risk altering clinically meaningful structures. We introduce Semantic-Partitioned Contrastive Learning (S-PCL), an efficient pre-training framework tailored for CXR representation learning. Instead of reconstructing pixels or relying on heavy augmentations, S-PCL randomly partitions patch tokens from a single CXR into two non-overlapping semantic subsets. Each subset provides a complementary but incomplete view. The encoder must maximize agreement between these partitions, implicitly inferring global anatomical layout and local pathological cues from partial evidence. This semantic partitioning forms an internal bottleneck that enforces long-range dependency modeling and structural coherence. S-PCL eliminates the need for hand-crafted augmentations, auxiliary decoders, and momentum encoders. The resulting architecture is streamlined, computationally efficient, and easy to scale. Extensive experiments on large-scale CXR benchmarks, including ChestX-ray14, CheXpert, RSNA Pneumonia and SIIM-ACR Pneumothorax, show that S-PCL achieves competitive performance while attaining the lowest GFLOPs and superior accuracy among existing SSL approaches.
Paper Structure (10 sections, 2 equations, 3 figures, 4 tables)

This paper contains 10 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Our occluded image contrastive learning: Through non-overlapping occluding, distinct tokens within an image are categorized as intraclass, while across-image tokens within a batch are viewed as interclass.
  • Figure 2: Efficiency and scaling comparison of MRMzhou2023advancing, Medical MAExiao2023delving, M3AE chen2022multi, and our method on CheXpert fine-tuning, measured by mAUC (%).
  • Figure 3: t-SNE visualization of the learned global representations on CheXpert benchmark.