A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation
Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li
TL;DR
This paper evaluates Segment Anything 2 (SAM2) for zero-shot 3D CT segmentation and addresses the variability in prior benchmarks caused by different evaluation pipelines. It reproduces SAM2's eight-iteration interactive protocol on multiple 3D CT datasets and compares against established baselines such as VISTA3D, providing a standardized benchmarking framework and code. The study finds that SAM2 in zero-shot mode generates many false positives when foreground objects disappear and that adding more slices yields limited gains, with strong performance only for small, single-connected structures when background slices are removed. The authors conclude that zero-shot SAM2 is not yet satisfactory for 3D medical imaging and advocate finetuning or new 3D-aware approaches, delivering a reproducible protocol to guide future research.
Abstract
Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out that the SAM2 paper clearly outlines a zero-shot evaluation pipeline, which simulates user clicks iteratively for up to eight iterations. We reproduced this interactive annotation simulation on 3D CT datasets and provided the results and code~\url{https://github.com/Project-MONAI/VISTA}. Our findings reveal that directly applying SAM2 on 3D medical imaging in a zero-shot manner is far from satisfactory. It is prone to generating false positives when foreground objects disappear, and annotating more slices cannot fully offset this tendency. For smaller single-connected objects like kidney and aorta, SAM2 performs reasonably well but for most organs it is still far behind state-of-the-art 3D annotation methods. More research and innovation are needed for 3D medical imaging community to use SAM2 correctly.
