Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain
Kuan-I Chung, Daniel Moyer
TL;DR
This work addresses evaluating interactive segmentation by measuring a model's understanding of user prompts rather than relying solely on Oracle Dice. It formulates the interaction as Bayesian Experimental Design, modeling segmentation as a belief map $\theta$ and prompts as observations, and introduces a practical nested Monte Carlo method to estimate the per-pixel Expected Information Gain $EIG(d)$. Across three SAM-based models and natural and medical image datasets, the authors show that $EIG$-guided prompting discriminates models by their prompt-understanding, while Oracle Dice can mask fundamental differences due to prompt-encoder flexibility. The approach highlights the need for $EIG$-driven metrics to assess interactive segmentation performance in-domain and out-of-domain, with implications for model design and evaluation in medical imaging contexts.
Abstract
We introduce an assessment procedure for interactive segmentation models. Based on concepts from Bayesian Experimental Design, the procedure measures a model's understanding of point prompts and their correspondence with the desired segmentation mask. We show that Oracle Dice index measurements are insensitive or even misleading in measuring this property. We demonstrate the use of the proposed procedure on three interactive segmentation models and subsets of two large image segmentation datasets.
