Self-Supervised Radiograph Anatomical Region Classification -- How Clean Is Your Real-World Data?
Simon Langer, Jessica Ritter, Rickmer Braren, Daniel Rueckert, Paul Hager
TL;DR
The paper tackles the challenge of missing or erroneous radiograph anatomical region labels in large clinical datasets by leveraging self-supervised contrastive learning (SimCLR, BYOL) and supervised contrastive learning to classify radiographs into $14$ anatomical regions. It builds a robust pipeline using $48{,}434$ images with noisy PACS labels, employing targeted image cleaning and a gauge augmentation to prevent non-anatomical cues, and evaluates with a linear classifier on frozen backbones, achieving about $96.6\%$–$97.7\%$ accuracy. The authors show strong performance in low-label scenarios (e.g., $1\%$ labeled data yields $92.2\%$) and reveal that many detected labeling errors are present in PACS, which can be corrected to push accuracy toward $98.0\%$–$98.8\%$ with ensemble predictions. The work demonstrates practical benefits for improving PACS metadata, expanding usable datasets, and enabling reliable, low-resource anatomical region labeling in real-world clinical settings.
Abstract
Modern deep learning-based clinical imaging workflows rely on accurate labels of the examined anatomical region. Knowing the anatomical region is required to select applicable downstream models and to effectively generate cohorts of high quality data for future medical and machine learning research efforts. However, this information may not be available in externally sourced data or generally contain data entry errors. To address this problem, we show the effectiveness of self-supervised methods such as SimCLR and BYOL as well as supervised contrastive deep learning methods in assigning one of 14 anatomical region classes in our in-house dataset of 48,434 skeletal radiographs. We achieve a strong linear evaluation accuracy of 96.6% with a single model and 97.7% using an ensemble approach. Furthermore, only a few labeled instances (1% of the training set) suffice to achieve an accuracy of 92.2%, enabling usage in low-label and thus low-resource scenarios. Our model can be used to correct data entry mistakes: a follow-up analysis of the test set errors of our best-performing single model by an expert radiologist identified 35% incorrect labels and 11% out-of-domain images. When accounted for, the radiograph anatomical region labelling performance increased -- without and with an ensemble, respectively -- to a theoretical accuracy of 98.0% and 98.8%.
