Table of Contents
Fetching ...

Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain

Kuan-I Chung, Daniel Moyer

TL;DR

This work addresses evaluating interactive segmentation by measuring a model's understanding of user prompts rather than relying solely on Oracle Dice. It formulates the interaction as Bayesian Experimental Design, modeling segmentation as a belief map $\theta$ and prompts as observations, and introduces a practical nested Monte Carlo method to estimate the per-pixel Expected Information Gain $EIG(d)$. Across three SAM-based models and natural and medical image datasets, the authors show that $EIG$-guided prompting discriminates models by their prompt-understanding, while Oracle Dice can mask fundamental differences due to prompt-encoder flexibility. The approach highlights the need for $EIG$-driven metrics to assess interactive segmentation performance in-domain and out-of-domain, with implications for model design and evaluation in medical imaging contexts.

Abstract

We introduce an assessment procedure for interactive segmentation models. Based on concepts from Bayesian Experimental Design, the procedure measures a model's understanding of point prompts and their correspondence with the desired segmentation mask. We show that Oracle Dice index measurements are insensitive or even misleading in measuring this property. We demonstrate the use of the proposed procedure on three interactive segmentation models and subsets of two large image segmentation datasets.

Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain

TL;DR

This work addresses evaluating interactive segmentation by measuring a model's understanding of user prompts rather than relying solely on Oracle Dice. It formulates the interaction as Bayesian Experimental Design, modeling segmentation as a belief map and prompts as observations, and introduces a practical nested Monte Carlo method to estimate the per-pixel Expected Information Gain . Across three SAM-based models and natural and medical image datasets, the authors show that -guided prompting discriminates models by their prompt-understanding, while Oracle Dice can mask fundamental differences due to prompt-encoder flexibility. The approach highlights the need for -driven metrics to assess interactive segmentation performance in-domain and out-of-domain, with implications for model design and evaluation in medical imaging contexts.

Abstract

We introduce an assessment procedure for interactive segmentation models. Based on concepts from Bayesian Experimental Design, the procedure measures a model's understanding of point prompts and their correspondence with the desired segmentation mask. We show that Oracle Dice index measurements are insensitive or even misleading in measuring this property. We demonstrate the use of the proposed procedure on three interactive segmentation models and subsets of two large image segmentation datasets.
Paper Structure (12 sections, 7 equations, 2 figures)

This paper contains 12 sections, 7 equations, 2 figures.

Figures (2)

  • Figure 1: We display Oracle Dice index, EIG-Guided Dice index, and the step-wise max EIG amount (in Nats) as a function of steps (x-axis) for different SAM variants (columns) and categories of segmentation. Note that all quantities share the same y-axis scale; we plot the step-wise max EIG for context.
  • Figure 2: Simulation. The heat maps are the EIG before clicking the new prompts (stars). Hence, there's no EIG map for step 0. Essentially, the star signs are at highest EIG location. We also show the maximum EIG and the Dice score of prediction on the images.