Table of Contents
Fetching ...

Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion

Bingzhi Shen, Lufan Chang, Siqi Chen, Shuxiang Guo, Hao Liu

TL;DR

LIM-Net addresses the challenge of annotating large 3D medical volumes by introducing a lightweight CNN-based framework that starts from 2D user hints and propagates masks through the volume via a memory-augmented sequence. A Multi-Round Result Fusion (MRF) module selects high-quality masks across interaction rounds, while a memory-augmented propagator enables bidirectional, long-range segmentation with efficient memory management. The approach demonstrates strong generalization to unseen data, competitive accuracy with fewer interactions, and real-time performance on resource-limited hardware, offering a practical baseline that complements SAM-based methods. Overall, LIM-Net achieves robust interactive segmentation for diverse modalities and anatomical regions, with significant improvements in consistency and efficiency for 3D medical image annotation.

Abstract

In medical imaging, precise annotation of lesions or organs is often required. However, 3D volumetric images typically consist of hundreds or thousands of slices, making the annotation process extremely time-consuming and laborious. Recently, the Segment Anything Model (SAM) has drawn widespread attention due to its remarkable zero-shot generalization capabilities in interactive segmentation. While researchers have explored adapting SAM for medical applications, such as using SAM adapters or constructing 3D SAM models, a key question remains: Can traditional CNN networks achieve the same strong zero-shot generalization in this task? In this paper, we propose the Lightweight Interactive Network for 3D Medical Image Segmentation (LIM-Net), a novel approach demonstrating the potential of compact CNN-based models. Built upon a 2D CNN backbone, LIM-Net initiates segmentation by generating a 2D prompt mask from user hints. This mask is then propagated through the 3D sequence via the Memory Module. To refine and stabilize results during interaction, the Multi-Round Result Fusion (MRF) Module selects and merges optimal masks from multiple rounds. Our extensive experiments across multiple datasets and modalities demonstrate LIM-Net's competitive performance. It exhibits stronger generalization to unseen data compared to SAM-based models, with competitive accuracy while requiring fewer interactions. Notably, LIM-Net's lightweight design offers significant advantages in deployment and inference efficiency, with low GPU memory consumption suitable for resource-constrained environments. These promising results demonstrate LIM-Net can serve as a strong baseline, complementing and contrasting with popular SAM models to further boost effective interactive medical image segmentation. The code will be released at \url{https://github.com/goodtime-123/LIM-Net}.

Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion

TL;DR

LIM-Net addresses the challenge of annotating large 3D medical volumes by introducing a lightweight CNN-based framework that starts from 2D user hints and propagates masks through the volume via a memory-augmented sequence. A Multi-Round Result Fusion (MRF) module selects high-quality masks across interaction rounds, while a memory-augmented propagator enables bidirectional, long-range segmentation with efficient memory management. The approach demonstrates strong generalization to unseen data, competitive accuracy with fewer interactions, and real-time performance on resource-limited hardware, offering a practical baseline that complements SAM-based methods. Overall, LIM-Net achieves robust interactive segmentation for diverse modalities and anatomical regions, with significant improvements in consistency and efficiency for 3D medical image annotation.

Abstract

In medical imaging, precise annotation of lesions or organs is often required. However, 3D volumetric images typically consist of hundreds or thousands of slices, making the annotation process extremely time-consuming and laborious. Recently, the Segment Anything Model (SAM) has drawn widespread attention due to its remarkable zero-shot generalization capabilities in interactive segmentation. While researchers have explored adapting SAM for medical applications, such as using SAM adapters or constructing 3D SAM models, a key question remains: Can traditional CNN networks achieve the same strong zero-shot generalization in this task? In this paper, we propose the Lightweight Interactive Network for 3D Medical Image Segmentation (LIM-Net), a novel approach demonstrating the potential of compact CNN-based models. Built upon a 2D CNN backbone, LIM-Net initiates segmentation by generating a 2D prompt mask from user hints. This mask is then propagated through the 3D sequence via the Memory Module. To refine and stabilize results during interaction, the Multi-Round Result Fusion (MRF) Module selects and merges optimal masks from multiple rounds. Our extensive experiments across multiple datasets and modalities demonstrate LIM-Net's competitive performance. It exhibits stronger generalization to unseen data compared to SAM-based models, with competitive accuracy while requiring fewer interactions. Notably, LIM-Net's lightweight design offers significant advantages in deployment and inference efficiency, with low GPU memory consumption suitable for resource-constrained environments. These promising results demonstrate LIM-Net can serve as a strong baseline, complementing and contrasting with popular SAM models to further boost effective interactive medical image segmentation. The code will be released at \url{https://github.com/goodtime-123/LIM-Net}.

Paper Structure

This paper contains 16 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The proposed LIM-Net consists of two parts: (Part A) It first generates initial masks from user clicks using a 2D Interaction Module. Then, a memory model extends the prompt mask to the entire sequence. Finally, MRF module select mask across multiple interaction rounds for consistent segmentation quality. (Part B) a Memory Model that incorporates different types of memory to retrieve relevant information and generate masks by single prompt mask.
  • Figure 2: The overall framework of the MRF module.
  • Figure 3: Average Dice changes across different interaction rounds on the MSD-lung, MSD-colon, KiTS-Organ, and KiTS-Tumor datasets
  • Figure 4: Segmentation Results from 2D Interaction Module on Different Datasets. The results serve as prompt frames.
  • Figure 5: Segmentation results of BTCV dataset
  • ...and 1 more figures