Table of Contents
Fetching ...

MiSuRe is all you need to explain your image segmentation

Syed Nouman Hasany, Fabrice Mériaudeau, Caroline Petitjean

TL;DR

MiSuRe introduces a two-stage, model-agnostic saliency framework for image segmentation that first identifies a sufficient region via masked dilation and Dice-guided expansion, then refines to a minimally sufficient region by optimizing a composite Dice-based objective with L1 and total-variation regularization. The approach yields a coarse sufficient region $X_{SR}$ and a fine minimally sufficient region $X_{MSR}$, enabling both localization-based explanations and deeper insights into the segmentation process, with potential use as a post-hoc reliability proxy. Across Triangle, Synapse, and COCO-2017, MiSuRe demonstrates favorable trade-offs compared with Seg-Grad-CAM and RISE in terms of localization, map fineness, and computation, and shows promise for transformer-based models as well. The work also introduces a post-hoc reliability classifier leveraging saliency features, highlighting a practical path toward automatic confidence assessment when ground truth is unavailable, while acknowledging hyperparameter sensitivity and iterative optimization as current limitations.

Abstract

The last decade of computer vision has been dominated by Deep Learning architectures, thanks to their unparalleled success. Their performance, however, often comes at the cost of explainability owing to their highly non-linear nature. Consequently, a parallel field of eXplainable Artificial Intelligence (XAI) has developed with the aim of generating insights regarding the decision making process of deep learning models. An important problem in XAI is that of the generation of saliency maps. These are regions in an input image which contributed most towards the model's final decision. Most work in this regard, however, has been focused on image classification, and image segmentation - despite being a ubiquitous task - has not received the same attention. In the present work, we propose MiSuRe (Minimally Sufficient Region) as an algorithm to generate saliency maps for image segmentation. The goal of the saliency maps generated by MiSuRe is to get rid of irrelevant regions, and only highlight those regions in the input image which are crucial to the image segmentation decision. We perform our analysis on 3 datasets: Triangle (artificially constructed), COCO-2017 (natural images), and the Synapse multi-organ (medical images). Additionally, we identify a potential usecase of these post-hoc saliency maps in order to perform post-hoc reliability of the segmentation model.

MiSuRe is all you need to explain your image segmentation

TL;DR

MiSuRe introduces a two-stage, model-agnostic saliency framework for image segmentation that first identifies a sufficient region via masked dilation and Dice-guided expansion, then refines to a minimally sufficient region by optimizing a composite Dice-based objective with L1 and total-variation regularization. The approach yields a coarse sufficient region and a fine minimally sufficient region , enabling both localization-based explanations and deeper insights into the segmentation process, with potential use as a post-hoc reliability proxy. Across Triangle, Synapse, and COCO-2017, MiSuRe demonstrates favorable trade-offs compared with Seg-Grad-CAM and RISE in terms of localization, map fineness, and computation, and shows promise for transformer-based models as well. The work also introduces a post-hoc reliability classifier leveraging saliency features, highlighting a practical path toward automatic confidence assessment when ground truth is unavailable, while acknowledging hyperparameter sensitivity and iterative optimization as current limitations.

Abstract

The last decade of computer vision has been dominated by Deep Learning architectures, thanks to their unparalleled success. Their performance, however, often comes at the cost of explainability owing to their highly non-linear nature. Consequently, a parallel field of eXplainable Artificial Intelligence (XAI) has developed with the aim of generating insights regarding the decision making process of deep learning models. An important problem in XAI is that of the generation of saliency maps. These are regions in an input image which contributed most towards the model's final decision. Most work in this regard, however, has been focused on image classification, and image segmentation - despite being a ubiquitous task - has not received the same attention. In the present work, we propose MiSuRe (Minimally Sufficient Region) as an algorithm to generate saliency maps for image segmentation. The goal of the saliency maps generated by MiSuRe is to get rid of irrelevant regions, and only highlight those regions in the input image which are crucial to the image segmentation decision. We perform our analysis on 3 datasets: Triangle (artificially constructed), COCO-2017 (natural images), and the Synapse multi-organ (medical images). Additionally, we identify a potential usecase of these post-hoc saliency maps in order to perform post-hoc reliability of the segmentation model.
Paper Structure (26 sections, 2 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 2 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Sample results (in row) from the Triangle dataset. The column 'Dilation' refers to $X_{SR}$ whereas column 'MSR' (saliency map) refers to $M_{MSR}$. SG-CAM: Seg-Grad-CAM. Results are best viewed zoomed-in.
  • Figure 2: Sample results from the Synapse multi-organ CT dataset, from U-Net and TransUNet. Each pair of rows (top-to-bottom) represents a class to be explained: Aorta, Left Kidney, Gall Bladder and Liver. Dilation refers to $X_{SR}$ whereas MSR (saliency map) refers to $M_{MSR}$. Results are best viewed zoomed-in.
  • Figure 3: Sample results from the COCO-2017 dataset. Row number is below the image. Each row (top-to-bottom) represents a class to be explained: Bike, Bus, Car, Cat, Dog, Cow, Train and Person. Dilation refers to $X_{SR}$ whereas MSR (saliency map) refers to $M_{MSR}$. Results are best viewed zoomed-in.
  • Figure 4: Impact of Prediction Size on the Synapse multi-organ CT dataset for U-Net. Plots of the No. of Dilations against the Prediction size (left), and the Perturbation Ratio against the Prediction size (right). The plots, from top-to-bottom, are for the categories: Aorta, Gall Bladder, Left Kidney, Right Kidney.
  • Figure 5: Impact of Prediction Size on the Synapse multi-organ CT dataset for TransUNet. Plots of the No. of Dilations against the Prediction size (left), and the Perturbation Ratio against the Prediction size (right). The plots, from top-to-bottom, are for the categories: Aorta, Gall Bladder, Left Kidney, Right Kidney.
  • ...and 4 more figures