Table of Contents
Fetching ...

Framing image registration as a landmark detection problem for label-noise-aware task representation (HitR)

Diana Waldmannstetter, Ivan Ezhov, Benedikt Wiestler, Francesco Campi, Ivan Kukuljan, Stefan Ehrlich, Shankeeth Vinayahalingam, Bhakti Baheti, Satrajit Chakrabarty, Ujjwal Baid, Spyridon Bakas, Julian Schwarting, Marie Metz, Jan S. Kirschke, Daniel Rueckert, Rolf A. Heckemann, Marie Piraud, Bjoern H. Menze, Florian Kofler

TL;DR

HitR reframes image-registration evaluation by introducing a landmark-based, label-noise-aware metric that measures whether predicted landmarks fall within confidence ROIs derived from inter-rater variation. The method aggregates multiple annotations, computes radii from annotator-distance distributions, and traces HitR curves across ROI sizes to reflect task-specific accuracy requirements. Experiments on BraTS-Reg with simulated annotation noise show HitR can reveal robustness and differences among algorithms beyond what TRE captures, underscoring its clinical relevance. This approach enables more realistic, application-aligned validation of registration methods and can extend to other biomedical imaging contexts.

Abstract

Accurate image registration is pivotal in biomedical image analysis, where selecting suitable registration algorithms demands careful consideration. While numerous algorithms are available, the evaluation metrics to assess their performance have remained relatively static. This study addresses this challenge by introducing a novel evaluation metric termed Landmark Hit Rate (HitR), which focuses on the clinical relevance of image registration accuracy. Unlike traditional metrics such as Target Registration Error, which emphasize subresolution differences, HitR considers whether registration algorithms successfully position landmarks within defined confidence zones. This paradigm shift acknowledges the inherent annotation noise in medical images, allowing for more meaningful assessments. To equip HitR with label-noise-awareness, we propose defining these confidence zones based on an Inter-rater Variance analysis. Consequently, hit rate curves are computed for varying landmark zone sizes, enabling performance measurement for a task-specific level of accuracy. Our approach offers a more realistic and meaningful assessment of image registration algorithms, reflecting their suitability for clinical and biomedical applications.

Framing image registration as a landmark detection problem for label-noise-aware task representation (HitR)

TL;DR

HitR reframes image-registration evaluation by introducing a landmark-based, label-noise-aware metric that measures whether predicted landmarks fall within confidence ROIs derived from inter-rater variation. The method aggregates multiple annotations, computes radii from annotator-distance distributions, and traces HitR curves across ROI sizes to reflect task-specific accuracy requirements. Experiments on BraTS-Reg with simulated annotation noise show HitR can reveal robustness and differences among algorithms beyond what TRE captures, underscoring its clinical relevance. This approach enables more realistic, application-aligned validation of registration methods and can extend to other biomedical imaging contexts.

Abstract

Accurate image registration is pivotal in biomedical image analysis, where selecting suitable registration algorithms demands careful consideration. While numerous algorithms are available, the evaluation metrics to assess their performance have remained relatively static. This study addresses this challenge by introducing a novel evaluation metric termed Landmark Hit Rate (HitR), which focuses on the clinical relevance of image registration accuracy. Unlike traditional metrics such as Target Registration Error, which emphasize subresolution differences, HitR considers whether registration algorithms successfully position landmarks within defined confidence zones. This paradigm shift acknowledges the inherent annotation noise in medical images, allowing for more meaningful assessments. To equip HitR with label-noise-awareness, we propose defining these confidence zones based on an Inter-rater Variance analysis. Consequently, hit rate curves are computed for varying landmark zone sizes, enabling performance measurement for a task-specific level of accuracy. Our approach offers a more realistic and meaningful assessment of image registration algorithms, reflecting their suitability for clinical and biomedical applications.
Paper Structure (14 sections, 11 equations, 5 figures)

This paper contains 14 sections, 11 equations, 5 figures.

Figures (5)

  • Figure 1: Hit or Miss. Reference annotation (yellow) and predicted landmark (magenta). The red circle with radius $r$ indicates the tolerated ROI around a landmark, where the radius corresponds to a specified threshold derived by annotators' distances. A hit is classified according to \ref{['eq:hits']} and the hit rate (HitR) is calculated with \ref{['eq:metric']}, describing the ratio between hits and number of landmarks.
  • Figure 2: Distribution of distances between annotators' on a set of 399 re-annotated landmarks in the BraTS-Reg, see \ref{['sec:bratsreg']}. Most landmarks are re-annotated landmarks within a distance of around $5mm$, while a few landmarks are re-annotated with distances up to almost $24mm$.
  • Figure 3: Boxplot illustrating HitR per registration and annotator. HitR is computed according to \ref{['eq:metric']}. For the computation, we consider each rater's annotations' distance to the average landmark as a radius for the ROI. For each algorithm, HitR is calculated based on all registered landmarks and the defined ROI. The evaluated algorithms are submissions to the BraTS-Reg, see \ref{['sec:bratsreg']}. The performance of the algorithms significantly varies and overlaps depending on the underlying annotation confirming the decision of the challenge organizers to award the challenge contributions based on performance tiers.
  • Figure 4: Line chart comparing HitR in dependency of the radius$r$ for various registration algorithms. The evaluated algorithms are submissions to the BraTS-Reg, see \ref{['sec:bratsreg']}. HitR is computed for the points derived from \ref{['eq:sampling']}, while the lines are interpolated. Some algorithms reveal greatly improved performance once the ROI size is increased (indicated by crossing lines).
  • Figure 5: Boxplot illustrating TRE to all annotators' landmark annotations per registration algorithm. For each algorithm, the TRE between a registered landmark and every corresponding annotators' landmark is computed, respectively. The evaluated algorithms are submissions to the BraTS-Reg, see \ref{['sec:bratsreg']}. With a Pearson $r$ of $-0.49$, TRE only moderately correlates with HitR.