RadioActive: 3D Radiological Interactive Segmentation Benchmark
Constantin Ulrich, Tassilo Wald, Emily Tempus, Maximilian Rokuss, Paul F. Jaeger, Klaus Maier-Hein
TL;DR
RadioActive tackles the gap in 3D radiological interactive segmentation by providing an open, extensible benchmark that evaluates both 2D and 3D prompting strategies across ten diverse datasets using a standardized evaluation protocol. The framework introduces realistic prompting schemes (interpolation and propagation) and scribble-based refinement to meaningfully reduce human effort while enabling iterative refinement. Across seven models, it reveals that SAM2 can outperform specialized medical models under realistic prompting, and that simple interpolation strategies can match slice-by-slice prompting, with iterative refinement further boosting accuracy. Bounding-box prompts generally outperform point prompts, while 2D prompting can rival 3D prompting when facilitated by effective prompting; however, 3D models still struggle on some MRI tasks and large structures. By open-sourcing RadioActive, the authors provide a reproducible, community-driven platform to accelerate progress in interactive 3D medical image segmentation and its clinical adoption.
Abstract
Effortless and precise segmentation with minimal clinician effort could greatly streamline clinical workflows. Recent interactive segmentation models, inspired by METAs Segment Anything, have made significant progress but face critical limitations in 3D radiology. These include impractical human interaction requirements such as slice-by-slice operations for 2D models on 3D data and a lack of iterative refinement. Prior studies have been hindered by inadequate evaluation protocols, resulting in unreliable performance assessments and inconsistent findings across studies. The RadioActive benchmark addresses these challenges by providing a rigorous and reproducible evaluation framework for interactive segmentation methods in clinically relevant scenarios. It features diverse datasets, a wide range of target structures, and the most impactful 2D and 3D interactive segmentation methods, all within a flexible and extensible codebase. We also introduce advanced prompting techniques that reduce interaction steps, enabling fair comparisons between 2D and 3D models. Surprisingly, SAM2 outperforms all specialized medical 2D and 3D models in a setting requiring only a few interactions to generate prompts for a 3D volume. This challenges prevailing assumptions and demonstrates that general-purpose models surpass specialized medical approaches. By open-sourcing RadioActive, we invite researchers to integrate their models and prompting techniques, ensuring continuous and transparent evaluation of 3D medical interactive models.
