SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAM
Benjamin Towle, Xin Chen, Ke Zhou
TL;DR
Medical image segmentation often involves uncertainty with multiple plausible annotations. SeqSAM introduces an autoregressive extension of the Segment Anything Model that generates a sequence of masks $\{\hat{\mathbf{y}}^{(m)}\}_{m=1}^M$ conditioned on previous outputs, trained with a set-based Hungarian loss to align each prediction to one of the $K$ ground-truth masks. On the datasets LIDC-IDRI and QUBIQ Kidney, SeqSAM achieves state-of-the-art performance in $D_{avg}$ and $GED$, while supporting an arbitrary output count $M$ without retraining. This approach yields multiple clinically relevant segmentation hypotheses, enabling more robust decision making in practice.
Abstract
Pre-trained segmentation models are a powerful and flexible tool for segmenting images. Recently, this trend has extended to medical imaging. Yet, often these methods only produce a single prediction for a given image, neglecting inherent uncertainty in medical images, due to unclear object boundaries and errors caused by the annotation tool. Multiple Choice Learning is a technique for generating multiple masks, through multiple learned prediction heads. However, this cannot readily be extended to producing more outputs than its initial pre-training hyperparameters, as the sparse, winner-takes-all loss function makes it easy for one prediction head to become overly dominant, thus not guaranteeing the clinical relevancy of each mask produced. We introduce SeqSAM, a sequential, RNN-inspired approach to generating multiple masks, which uses a bipartite matching loss for ensuring the clinical relevancy of each mask, and can produce an arbitrary number of masks. We show notable improvements in quality of each mask produced across two publicly available datasets. Our code is available at https://github.com/BenjaminTowle/SeqSAM.
