MIQ-SAM3D: From Single-Point Prompt to Multi-Instance Segmentation via Competitive Query Refinement
Jierui Qu, Jianchun Zhao
TL;DR
This work addresses the need for efficient multi-lesion segmentation in 3D medical images, a scenario where existing SAM-based methods struggle with single-point prompts. It introduces MIQ-SAM3D, which converts a single user click into multiple instance queries via a prompt-conditioned generator and refines them competitively in a joint decoder, all within a hybrid CNN-Transformer encoder that preserves local boundary detail while modeling global context. Empirical results on LiTS17 and KiTS21 show competitive Dice and NSD scores and strong robustness to prompt variations, with ablations confirming the critical roles of PC-IQG, CQRD, and the dual-branch encoder. The approach offers a practical, end-to-end solution for annotating clinically relevant multi-lesion cases and advances promptable 3D medical image segmentation toward real-world deployment.
Abstract
Accurate segmentation of medical images is fundamental to tumor diagnosis and treatment planning. SAM-based interactive segmentation has gained attention for its strong generalization, but most methods follow a single-point-to-single-object paradigm, which limits multi-lesion segmentation. Moreover, ViT backbones capture global context but often miss high-fidelity local details. We propose MIQ-SAM3D, a multi-instance 3D segmentation framework with a competitive query optimization strategy that shifts from single-point-to-single-mask to single-point-to-multi-instance. A prompt-conditioned instance-query generator transforms a single point prompt into multiple specialized queries, enabling retrieval of all semantically similar lesions across the 3D volume from a single exemplar. A hybrid CNN-Transformer encoder injects CNN-derived boundary saliency into ViT self-attention via spatial gating. A competitively optimized query decoder then enables end-to-end, parallel, multi-instance prediction through inter-query competition. On LiTS17 and KiTS21 dataset, MIQ-SAM3D achieved comparable levels and exhibits strong robustness to prompts, providing a practical solution for efficient annotation of clinically relevant multi-lesion cases.
