Pitfalls of topology-aware image segmentation
Alexander H. Berger, Laurin Lux, Alexander Weers, Martin Menten, Daniel Rueckert, Johannes C. Paetzold
TL;DR
The paper investigates why topology-aware segmentation methods are difficult to compare fairly, identifying three key benchmarking pitfalls: connectivity definitions, ground-truth artifacts, and misaligned evaluation metrics. Through empirical analyses on standard datasets such as DRIVE, CREMI, and Roads, it shows that connectivity choices can dominate or invert method rankings and that label artifacts bias topological measures. It argues that evaluation metrics should disentangle topological information from volumetric accuracy, favoring Betti-number-based and Betti-matching approaches with dimension-wise reporting. The authors offer concrete recommendations to establish dataset-aware connectivity, artifact-aware ground truth curation, and problem-specific, disentangled metrics to enable robust, fair evaluation of topology-aware medical image segmentation methods.
Abstract
Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues' profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.
