Table of Contents
Fetching ...

nnInteractive: Redefining 3D Promptable Segmentation

Fabian Isensee, Maximilian Rokuss, Lars Krämer, Stefan Dinkelacker, Ashis Ravindran, Florian Stritzke, Benjamin Hamm, Tassilo Wald, Moritz Langenberg, Constantin Ulrich, Jonathan Deissler, Ralf Floca, Klaus Maier-Hein

TL;DR

3D medical image segmentation requires maintaining volumetric consistency across diverse modalities and structures. nnInteractive delivers a UNet-based 3D interactive open-set segmentation framework that converts intuitive 2D prompts (points, scribbles, boxes, and a novel lasso) into full 3D masks, with AutoZoom and diverse user interactions to handle large and ambiguous targets. Trained on 120+ multimodal datasets and enhanced by SuperVoxels, it achieves state-of-the-art accuracy while remaining VRAM-efficient and usable within Napari and MITK; extensive benchmarking across 14 datasets and OOD scenarios demonstrates robust generalization. The approach substantially reduces annotation effort and accelerates clinical workflows, enabling real-world adoption of AI-assisted 3D segmentation in radiology and research.

Abstract

Accurate and efficient 3D segmentation is essential for both clinical and research applications. While foundation models like SAM have revolutionized interactive segmentation, their 2D design and domain shift limitations make them ill-suited for 3D medical images. Current adaptations address some of these challenges but remain limited, either lacking volumetric awareness, offering restricted interactivity, or supporting only a small set of structures and modalities. Usability also remains a challenge, as current tools are rarely integrated into established imaging platforms and often rely on cumbersome web-based interfaces with restricted functionality. We introduce nnInteractive, the first comprehensive 3D interactive open-set segmentation method. It supports diverse prompts-including points, scribbles, boxes, and a novel lasso prompt-while leveraging intuitive 2D interactions to generate full 3D segmentations. Trained on 120+ diverse volumetric 3D datasets (CT, MRI, PET, 3D Microscopy, etc.), nnInteractive sets a new state-of-the-art in accuracy, adaptability, and usability. Crucially, it is the first method integrated into widely used image viewers (e.g., Napari, MITK), ensuring broad accessibility for real-world clinical and research applications. Extensive benchmarking demonstrates that nnInteractive far surpasses existing methods, setting a new standard for AI-driven interactive 3D segmentation. nnInteractive is publicly available: https://github.com/MIC-DKFZ/napari-nninteractive (Napari plugin), https://www.mitk.org/MITK-nnInteractive (MITK integration), https://github.com/MIC-DKFZ/nnInteractive (Python backend).

nnInteractive: Redefining 3D Promptable Segmentation

TL;DR

3D medical image segmentation requires maintaining volumetric consistency across diverse modalities and structures. nnInteractive delivers a UNet-based 3D interactive open-set segmentation framework that converts intuitive 2D prompts (points, scribbles, boxes, and a novel lasso) into full 3D masks, with AutoZoom and diverse user interactions to handle large and ambiguous targets. Trained on 120+ multimodal datasets and enhanced by SuperVoxels, it achieves state-of-the-art accuracy while remaining VRAM-efficient and usable within Napari and MITK; extensive benchmarking across 14 datasets and OOD scenarios demonstrates robust generalization. The approach substantially reduces annotation effort and accelerates clinical workflows, enabling real-world adoption of AI-assisted 3D segmentation in radiology and research.

Abstract

Accurate and efficient 3D segmentation is essential for both clinical and research applications. While foundation models like SAM have revolutionized interactive segmentation, their 2D design and domain shift limitations make them ill-suited for 3D medical images. Current adaptations address some of these challenges but remain limited, either lacking volumetric awareness, offering restricted interactivity, or supporting only a small set of structures and modalities. Usability also remains a challenge, as current tools are rarely integrated into established imaging platforms and often rely on cumbersome web-based interfaces with restricted functionality. We introduce nnInteractive, the first comprehensive 3D interactive open-set segmentation method. It supports diverse prompts-including points, scribbles, boxes, and a novel lasso prompt-while leveraging intuitive 2D interactions to generate full 3D segmentations. Trained on 120+ diverse volumetric 3D datasets (CT, MRI, PET, 3D Microscopy, etc.), nnInteractive sets a new state-of-the-art in accuracy, adaptability, and usability. Crucially, it is the first method integrated into widely used image viewers (e.g., Napari, MITK), ensuring broad accessibility for real-world clinical and research applications. Extensive benchmarking demonstrates that nnInteractive far surpasses existing methods, setting a new standard for AI-driven interactive 3D segmentation. nnInteractive is publicly available: https://github.com/MIC-DKFZ/napari-nninteractive (Napari plugin), https://www.mitk.org/MITK-nnInteractive (MITK integration), https://github.com/MIC-DKFZ/nnInteractive (Python backend).

Paper Structure

This paper contains 45 sections, 1 equation, 14 figures, 4 tables.

Figures (14)

  • Figure 1: nnInteractive fully unlocks the potential of 3D interactive segmentation. Supporting a diverse set of prompting styles, it generates full 3D segmentations from intuitive 2D interactions. Prompts can be arbitrarily mixed and placed on any axis. nnInteractive is open set and supports all modalities. It quickly adapts to user input to accurately segment any target structure.
  • Figure 2: Overview of the nnInteractive Training Pipeline. The model first receives an input image and an initial prompt. The network then generates a prediction, which is used to compute the loss and identify false positive/negative areas. Based on the interaction agent simulating user input, a new prompt is sampled and added to the network input along with the current prediction.
  • Figure 3: Bounding Box vs. Lasso. A bounding box interaction often requires additional refinement, whereas a fine-grained lasso interaction enables precise segmentation in a single step.
  • Figure 4: Auto Zoom. nnInteractive adaptively zooms out to ensure complete segmentation of large structures. By detecting border changes and dynamically querying additional regions, it preserves global context while refining local details, mitigating the constraints of patch-wise processing.
  • Figure 5: SuperVoxels enhancing training label diversity. Classical algorithms like SLIC and Felzenszwalb produce parcellation or fuzzy boundaries, even using image embeddings (Vista3D), while our approach yields precise variable-sized objects.
  • ...and 9 more figures