Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Roberto Di Via; Francesca Odone; Vito Paolo Pastore

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Roberto Di Via, Francesca Odone, Vito Paolo Pastore

TL;DR

This paper tackles the data scarcity challenge in X-ray landmark detection by introducing a denoising diffusion probabilistic model (DDPM) for self-supervised pre-training. The authors pre-train a DDPM U-Net on unlabeled X-ray data to learn rich, multi-scale anatomical representations, then fine-tune for landmark heatmap prediction using Gaussian or contour-based losses. Across Chest, Cephalometric, and Hand datasets, the DDPM-based pre-training consistently outperforms ImageNet supervision and other SSL baselines, with the largest gains in ultra-low-data settings, and proves robust to different in-domain pre-training sources. The work demonstrates a practical, label-efficient pre-training strategy that can accelerate medical image analysis where annotations are scarce, and provides public code and models to enable broader adoption and replication.

Abstract

Deep neural networks have been extensively applied in the medical domain for various tasks, including image classification, segmentation, and landmark detection. However, their application is often hindered by data scarcity, both in terms of available annotations and images. This study introduces a novel application of denoising diffusion probabilistic models (DDPMs) to the landmark detection task, specifically addressing the challenge of limited annotated data in x-ray imaging. Our key innovation lies in leveraging DDPMs for self-supervised pre-training in landmark detection, a previously unexplored approach in this domain. This method enables accurate landmark detection with minimal annotated training data (as few as 50 images), surpassing both ImageNet supervised pre-training and traditional self-supervised techniques across three popular x-ray benchmark datasets. To our knowledge, this work represents the first application of diffusion models for self-supervised learning in landmark detection, which may offer a valuable pre-training approach in few-shot regimes, for mitigating data scarcity.

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

TL;DR

Abstract

Paper Structure (12 sections, 6 equations, 3 figures, 5 tables)

This paper contains 12 sections, 6 equations, 3 figures, 5 tables.

Introduction
Related Works
Approach
Experimental Setup
Datasets and Evaluation Metrics
Implementation Details
Experimental Results
Tuning the DDPM pre-training iterations
Downstream task performance evaluation
Ablation study on the downstream task setting.
Impact of different pre-training datasets
Conclusions

Figures (3)

Figure 1: Schematic representation of our DDPM self-supervised landmark detection pipeline.
Figure 2: Landmark detection performance on Chest, Cephalometric, and Hand validation sets across DDPM pre-training iterations.
Figure 3: Comparison of landmark detection performance between our DDPM pre-training method and alternative self-supervised and ImageNet pre-training approaches on Chest, Cephalometric, and Hand x-ray test sets, using only 10 labeled training samples.

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

TL;DR

Abstract

Self-supervised pre-training with diffusion model for few-shot landmark detection in x-ray images

Authors

TL;DR

Abstract

Table of Contents

Figures (3)