Table of Contents
Fetching ...

Self-Supervised Radiograph Anatomical Region Classification -- How Clean Is Your Real-World Data?

Simon Langer, Jessica Ritter, Rickmer Braren, Daniel Rueckert, Paul Hager

TL;DR

The paper tackles the challenge of missing or erroneous radiograph anatomical region labels in large clinical datasets by leveraging self-supervised contrastive learning (SimCLR, BYOL) and supervised contrastive learning to classify radiographs into $14$ anatomical regions. It builds a robust pipeline using $48{,}434$ images with noisy PACS labels, employing targeted image cleaning and a gauge augmentation to prevent non-anatomical cues, and evaluates with a linear classifier on frozen backbones, achieving about $96.6\%$–$97.7\%$ accuracy. The authors show strong performance in low-label scenarios (e.g., $1\%$ labeled data yields $92.2\%$) and reveal that many detected labeling errors are present in PACS, which can be corrected to push accuracy toward $98.0\%$–$98.8\%$ with ensemble predictions. The work demonstrates practical benefits for improving PACS metadata, expanding usable datasets, and enabling reliable, low-resource anatomical region labeling in real-world clinical settings.

Abstract

Modern deep learning-based clinical imaging workflows rely on accurate labels of the examined anatomical region. Knowing the anatomical region is required to select applicable downstream models and to effectively generate cohorts of high quality data for future medical and machine learning research efforts. However, this information may not be available in externally sourced data or generally contain data entry errors. To address this problem, we show the effectiveness of self-supervised methods such as SimCLR and BYOL as well as supervised contrastive deep learning methods in assigning one of 14 anatomical region classes in our in-house dataset of 48,434 skeletal radiographs. We achieve a strong linear evaluation accuracy of 96.6% with a single model and 97.7% using an ensemble approach. Furthermore, only a few labeled instances (1% of the training set) suffice to achieve an accuracy of 92.2%, enabling usage in low-label and thus low-resource scenarios. Our model can be used to correct data entry mistakes: a follow-up analysis of the test set errors of our best-performing single model by an expert radiologist identified 35% incorrect labels and 11% out-of-domain images. When accounted for, the radiograph anatomical region labelling performance increased -- without and with an ensemble, respectively -- to a theoretical accuracy of 98.0% and 98.8%.

Self-Supervised Radiograph Anatomical Region Classification -- How Clean Is Your Real-World Data?

TL;DR

The paper tackles the challenge of missing or erroneous radiograph anatomical region labels in large clinical datasets by leveraging self-supervised contrastive learning (SimCLR, BYOL) and supervised contrastive learning to classify radiographs into anatomical regions. It builds a robust pipeline using images with noisy PACS labels, employing targeted image cleaning and a gauge augmentation to prevent non-anatomical cues, and evaluates with a linear classifier on frozen backbones, achieving about accuracy. The authors show strong performance in low-label scenarios (e.g., labeled data yields ) and reveal that many detected labeling errors are present in PACS, which can be corrected to push accuracy toward with ensemble predictions. The work demonstrates practical benefits for improving PACS metadata, expanding usable datasets, and enabling reliable, low-resource anatomical region labeling in real-world clinical settings.

Abstract

Modern deep learning-based clinical imaging workflows rely on accurate labels of the examined anatomical region. Knowing the anatomical region is required to select applicable downstream models and to effectively generate cohorts of high quality data for future medical and machine learning research efforts. However, this information may not be available in externally sourced data or generally contain data entry errors. To address this problem, we show the effectiveness of self-supervised methods such as SimCLR and BYOL as well as supervised contrastive deep learning methods in assigning one of 14 anatomical region classes in our in-house dataset of 48,434 skeletal radiographs. We achieve a strong linear evaluation accuracy of 96.6% with a single model and 97.7% using an ensemble approach. Furthermore, only a few labeled instances (1% of the training set) suffice to achieve an accuracy of 92.2%, enabling usage in low-label and thus low-resource scenarios. Our model can be used to correct data entry mistakes: a follow-up analysis of the test set errors of our best-performing single model by an expert radiologist identified 35% incorrect labels and 11% out-of-domain images. When accounted for, the radiograph anatomical region labelling performance increased -- without and with an ensemble, respectively -- to a theoretical accuracy of 98.0% and 98.8%.

Paper Structure

This paper contains 16 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview -- we pre-train our backbone using self-supervision, then train a fully connected head, and finally use its high quality predictions to correct noisy PACS labels.
  • Figure 1: Evolution of an image -- the strengths of color jitter, random affine and random resized crop augmentations are noticeably higher during the self-supervised pretraining in order to ensure the extraction of relevant features.
  • Figure 2: Impact of the amount of labeled data on final performance. This shows the importance of using self-supervised methods in low-label (i.e. low-resource) settings, as the supervised baseline is outperformed by a large margin by our self-supervised approaches, which already perform very well at $\geq 1\%$ of all training data. Note that the x-axis is scaled logarithmically.
  • Figure 2: T-SNE visualization of the test images' features, generated by our SimCLR pre-training -- PACS labels later identified as incorrect have a star marker.
  • Figure 3: Guided GradCam Selvaraju2016GradCAMVEKokhlikyan2020CaptumAU visualization of our SimCLR model trained without (top) or with (bottom) our custom data cleaning and augmentations enabled. Note the much stronger focus of the bottom model on medically relevant image regions, rather than the border and gauge.
  • ...and 1 more figures