MiSuRe is all you need to explain your image segmentation
Syed Nouman Hasany, Fabrice Mériaudeau, Caroline Petitjean
TL;DR
MiSuRe introduces a two-stage, model-agnostic saliency framework for image segmentation that first identifies a sufficient region via masked dilation and Dice-guided expansion, then refines to a minimally sufficient region by optimizing a composite Dice-based objective with L1 and total-variation regularization. The approach yields a coarse sufficient region $X_{SR}$ and a fine minimally sufficient region $X_{MSR}$, enabling both localization-based explanations and deeper insights into the segmentation process, with potential use as a post-hoc reliability proxy. Across Triangle, Synapse, and COCO-2017, MiSuRe demonstrates favorable trade-offs compared with Seg-Grad-CAM and RISE in terms of localization, map fineness, and computation, and shows promise for transformer-based models as well. The work also introduces a post-hoc reliability classifier leveraging saliency features, highlighting a practical path toward automatic confidence assessment when ground truth is unavailable, while acknowledging hyperparameter sensitivity and iterative optimization as current limitations.
Abstract
The last decade of computer vision has been dominated by Deep Learning architectures, thanks to their unparalleled success. Their performance, however, often comes at the cost of explainability owing to their highly non-linear nature. Consequently, a parallel field of eXplainable Artificial Intelligence (XAI) has developed with the aim of generating insights regarding the decision making process of deep learning models. An important problem in XAI is that of the generation of saliency maps. These are regions in an input image which contributed most towards the model's final decision. Most work in this regard, however, has been focused on image classification, and image segmentation - despite being a ubiquitous task - has not received the same attention. In the present work, we propose MiSuRe (Minimally Sufficient Region) as an algorithm to generate saliency maps for image segmentation. The goal of the saliency maps generated by MiSuRe is to get rid of irrelevant regions, and only highlight those regions in the input image which are crucial to the image segmentation decision. We perform our analysis on 3 datasets: Triangle (artificially constructed), COCO-2017 (natural images), and the Synapse multi-organ (medical images). Additionally, we identify a potential usecase of these post-hoc saliency maps in order to perform post-hoc reliability of the segmentation model.
