Table of Contents
Fetching ...

Salt & Pepper Heatmaps: Diffusion-informed Landmark Detection Strategy

Julian Wyatt, Irina Voiculescu

TL;DR

This work tackles automatic Anatomical Landmark Detection by casting it as a diffusion-based generative task that yields multi-channel heatmaps capturing uncertainty. The authors introduce a single-step diffusion formulation with a multi-step backbone, a time-encoded U-Net, and a loss that combines a supervised term with a spatially normalized cross-entropy, sharpened by a gradually increasing Gaussian blur during reverse diffusion. On a cephalometric dataset, the proposed multi-step diffusion approach achieves state-of-the-art MRE and competitive SDR relative to prior methods, while single-step estimates lag behind, highlighting the benefit of iterative refinement. The method offers accurate, uncertainty-aware landmark localization with potential efficiency gains and motivates future extensions to region-based predictions and informed priors to further improve speed and accuracy in clinical workflows.

Abstract

Anatomical Landmark Detection is the process of identifying key areas of an image for clinical measurements. Each landmark is a single ground truth point labelled by a clinician. A machine learning model predicts the locus of a landmark as a probability region represented by a heatmap. Diffusion models have increased in popularity for generative modelling due to their high quality sampling and mode coverage, leading to their adoption in medical image processing for semantic segmentation. Diffusion modelling can be further adapted to learn a distribution over landmarks. The stochastic nature of diffusion models captures fluctuations in the landmark prediction, which we leverage by blurring into meaningful probability regions. In this paper, we reformulate automatic Anatomical Landmark Detection as a precise generative modelling task, producing a few-hot pixel heatmap. Our method achieves state-of-the-art MRE and comparable SDR performance with existing work.

Salt & Pepper Heatmaps: Diffusion-informed Landmark Detection Strategy

TL;DR

This work tackles automatic Anatomical Landmark Detection by casting it as a diffusion-based generative task that yields multi-channel heatmaps capturing uncertainty. The authors introduce a single-step diffusion formulation with a multi-step backbone, a time-encoded U-Net, and a loss that combines a supervised term with a spatially normalized cross-entropy, sharpened by a gradually increasing Gaussian blur during reverse diffusion. On a cephalometric dataset, the proposed multi-step diffusion approach achieves state-of-the-art MRE and competitive SDR relative to prior methods, while single-step estimates lag behind, highlighting the benefit of iterative refinement. The method offers accurate, uncertainty-aware landmark localization with potential efficiency gains and motivates future extensions to region-based predictions and informed priors to further improve speed and accuracy in clinical workflows.

Abstract

Anatomical Landmark Detection is the process of identifying key areas of an image for clinical measurements. Each landmark is a single ground truth point labelled by a clinician. A machine learning model predicts the locus of a landmark as a probability region represented by a heatmap. Diffusion models have increased in popularity for generative modelling due to their high quality sampling and mode coverage, leading to their adoption in medical image processing for semantic segmentation. Diffusion modelling can be further adapted to learn a distribution over landmarks. The stochastic nature of diffusion models captures fluctuations in the landmark prediction, which we leverage by blurring into meaningful probability regions. In this paper, we reformulate automatic Anatomical Landmark Detection as a precise generative modelling task, producing a few-hot pixel heatmap. Our method achieves state-of-the-art MRE and comparable SDR performance with existing work.
Paper Structure (12 sections, 11 equations, 3 figures, 1 table)

This paper contains 12 sections, 11 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Landmark heatmaps: hottest point avg. distance 0.72mm from green GT.
  • Figure 2: Reverse heatmap generation procedure: Initially using Gaussian noise at $x_T$; noise incrementally removed through the Markov Chain from $t=T,...,0$. The parameterised model $x_\theta$ takes the positional encoding and reference image as context to build the formulation of the noise reduction approximation $p_\theta(x_{t-1}|x_t)$. (Exaggerated landmark size; only one of the $\mathbf{N}$ channels shown.)
  • Figure 3: Comparative analysis of heatmaps with green ground truths overlaid on the context image: (a) High-quality heatmap, (b) Novel Salt and Pepper Heatmap generated with $\mathcal{N}(0,\mathbf{I})$ prior, (c) Medium-quality heatmap with missed prediction around anterior nasal spine landmark.