Table of Contents
Fetching ...

Pitfalls of topology-aware image segmentation

Alexander H. Berger, Laurin Lux, Alexander Weers, Martin Menten, Daniel Rueckert, Johannes C. Paetzold

TL;DR

The paper investigates why topology-aware segmentation methods are difficult to compare fairly, identifying three key benchmarking pitfalls: connectivity definitions, ground-truth artifacts, and misaligned evaluation metrics. Through empirical analyses on standard datasets such as DRIVE, CREMI, and Roads, it shows that connectivity choices can dominate or invert method rankings and that label artifacts bias topological measures. It argues that evaluation metrics should disentangle topological information from volumetric accuracy, favoring Betti-number-based and Betti-matching approaches with dimension-wise reporting. The authors offer concrete recommendations to establish dataset-aware connectivity, artifact-aware ground truth curation, and problem-specific, disentangled metrics to enable robust, fair evaluation of topology-aware medical image segmentation methods.

Abstract

Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues' profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.

Pitfalls of topology-aware image segmentation

TL;DR

The paper investigates why topology-aware segmentation methods are difficult to compare fairly, identifying three key benchmarking pitfalls: connectivity definitions, ground-truth artifacts, and misaligned evaluation metrics. Through empirical analyses on standard datasets such as DRIVE, CREMI, and Roads, it shows that connectivity choices can dominate or invert method rankings and that label artifacts bias topological measures. It argues that evaluation metrics should disentangle topological information from volumetric accuracy, favoring Betti-number-based and Betti-matching approaches with dimension-wise reporting. The authors offer concrete recommendations to establish dataset-aware connectivity, artifact-aware ground truth curation, and problem-specific, disentangled metrics to enable robust, fair evaluation of topology-aware medical image segmentation methods.

Abstract

Topological correctness, i.e., the preservation of structural integrity and specific characteristics of shape, is a fundamental requirement for medical imaging tasks, such as neuron or vessel segmentation. Despite the recent surge in topology-aware methods addressing this challenge, their real-world applicability is hindered by flawed benchmarking practices. In this paper, we identify critical pitfalls in model evaluation that include inadequate connectivity choices, overlooked topological artifacts in ground truth annotations, and inappropriate use of evaluation metrics. Through detailed empirical analysis, we uncover these issues' profound impact on the evaluation and ranking of segmentation methods. Drawing from our findings, we propose a set of actionable recommendations to establish fair and robust evaluation standards for topology-aware medical image segmentation methods.

Paper Structure

This paper contains 13 sections, 2 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Topological errors are present across distinct medical image segmentation tasks, e.g., in neuron, Circle of Willis, and retinal segmentation (top). We identify three critical pitfalls (bottom) in the evaluation of topology-aware segmentation methods. These include inadequate connectivity choices that misrepresent a dataset's semantics (left, e.g., representing a single vessel as multiple components), overlooked topological artifacts that skew evaluation results (center), and misaligned use of evaluation metrics that lack expressive power (right, e.g., VOI entangles volumetric and topological information).
  • Figure 2: Example of the importance of making the correct connectivity choices for the DRIVE and CREMI datasets. In the DRIVE dataset, small vessels are disconnected with 4-connectivity for the FG. In the CREMI dataset, synaptic clefts can become disconnected with 4-connectivity for the BG.
  • Figure 3: Examples of topological artifacts. The two left columns show connectivity artifacts and label noise in the DRIVE dataset. The right column shows resolution artifacts in the Roads dataset. Arrows and circles indicate topological artifacts.
  • Figure 4: Examples for the impact of inappropriate reporting practice on DRIVE (top) and unsuitable evaluation metrics on the Roads dataset (bottom). Left: Image with an overlay of the segmentation label. Middle: Unfavorable predictions, with detached vessels $BM =2$ (top) and disconnected residential blocks $VOI = 0.01$(bottom). Right: Favorable predictions with missing background components without semantic meaning $BM =10$ (top) and additional segmentation of parking areas $VOI =0.15$ (bottom).