Table of Contents
Fetching ...

SSplain: Sparse and Smooth Explainer for Retinopathy of Prematurity Classification

Elifnur Sunger, Tales Imbiriba, Peter Campbell, Deniz Erdogmus, Stratis Ioannidis, Jennifer Dy

TL;DR

SSplain addresses the need for interpretable explanations in Retinopathy of Prematurity classification by learning pixelwise masks that are simultaneously sparse and smooth, preserving the intrinsic vessel structure. It casts explanation generation as a constrained optimization problem, solved with ADMM to accommodate hard sparsity (ell0/ell1) and range constraints along with total variation regularization. Empirical results on ROP, MNIST, and FMNIST show SSplain outperforms nine baselines in post-hoc accuracy and aligns with domain-relevant cues such as vessel tortuosity and dilation, while maintaining sturdiness under sanity checks. The approach offers a practical pathway to domain-aware, reliable explanations that can generalize across image domains via adaptable sparsity and smoothness constraints.

Abstract

Neural networks are frequently used in medical diagnosis. However, due to their black-box nature, model explainers are used to help clinicians understand better and trust model outputs. This paper introduces an explainer method for classifying Retinopathy of Prematurity (ROP) from fundus images. Previous methods fail to generate explanations that preserve input image structures such as smoothness and sparsity. We introduce Sparse and Smooth Explainer (SSplain), a method that generates pixel-wise explanations while preserving image structures by enforcing smoothness and sparsity. This results in realistic explanations to enhance the understanding of the given black-box model. To achieve this goal, we define an optimization problem with combinatorial constraints and solve it using the Alternating Direction Method of Multipliers (ADMM). Experimental results show that SSplain outperforms commonly used explainers in terms of both post-hoc accuracy and smoothness analyses. Additionally, SSplain identifies features that are consistent with domain-understandable features that clinicians consider as discriminative factors for ROP. We also show SSplain's generalization by applying it to additional publicly available datasets. Code is available at https://github.com/neu-spiral/SSplain.

SSplain: Sparse and Smooth Explainer for Retinopathy of Prematurity Classification

TL;DR

SSplain addresses the need for interpretable explanations in Retinopathy of Prematurity classification by learning pixelwise masks that are simultaneously sparse and smooth, preserving the intrinsic vessel structure. It casts explanation generation as a constrained optimization problem, solved with ADMM to accommodate hard sparsity (ell0/ell1) and range constraints along with total variation regularization. Empirical results on ROP, MNIST, and FMNIST show SSplain outperforms nine baselines in post-hoc accuracy and aligns with domain-relevant cues such as vessel tortuosity and dilation, while maintaining sturdiness under sanity checks. The approach offers a practical pathway to domain-aware, reliable explanations that can generalize across image domains via adaptable sparsity and smoothness constraints.

Abstract

Neural networks are frequently used in medical diagnosis. However, due to their black-box nature, model explainers are used to help clinicians understand better and trust model outputs. This paper introduces an explainer method for classifying Retinopathy of Prematurity (ROP) from fundus images. Previous methods fail to generate explanations that preserve input image structures such as smoothness and sparsity. We introduce Sparse and Smooth Explainer (SSplain), a method that generates pixel-wise explanations while preserving image structures by enforcing smoothness and sparsity. This results in realistic explanations to enhance the understanding of the given black-box model. To achieve this goal, we define an optimization problem with combinatorial constraints and solve it using the Alternating Direction Method of Multipliers (ADMM). Experimental results show that SSplain outperforms commonly used explainers in terms of both post-hoc accuracy and smoothness analyses. Additionally, SSplain identifies features that are consistent with domain-understandable features that clinicians consider as discriminative factors for ROP. We also show SSplain's generalization by applying it to additional publicly available datasets. Code is available at https://github.com/neu-spiral/SSplain.

Paper Structure

This paper contains 51 sections, 9 equations, 16 figures, 1 table, 1 algorithm.

Figures (16)

  • Figure 1: Explanation maps of (a) a sample image with ROP using (b) SSplain, (c) Input$\times$Gradient shrikumar2016not, (d) Occlusion zeiler2014visualizing and (e) Extremal Perturbation fong2019understanding methods. We visualize only the top 5% of pixels, as ranked by the corresponding explanation maps, coloring the remaining pixels as purple. SSplain generates sparse and contiguous regions of importance, that are also more dilated and tortuous than obscured vessels. Sparse method Input$\times$Gradient focuses on similar areas but it is highly fragmented. Occlusion and Extremal Perturbation select discontinuous regions: tortuous vessels and dilated vessels, respectively. Both, however, fail to preserve vessel structure.
  • Figure 2: Comparison of explainers for the ROP dataset: SSplain-0 ($S_1$ with $\ell_0$ constraint), SSplain-1 ($S_1$ with $\ell_1$ constraint), Saliency simonyan2013deep, Input$\times$Gradient shrikumar2016not, Guided Grad-CAM selvaraju2017grad, Integrated Gradients sundararajan2017axiomatic, DeepSHAP lundberg2017unified, KernelSHAP lundberg2017unified, LIME ribeiro2016should, Occlusion zeiler2014visualizing and Extremal Perturbation fong2019understanding. (a) We report the average of: Post-hoc balanced accuracy with deletion (lower is better) and insertion (higher is better) of pixels with the highest attribution scores, connected components ratio during the insertion process, and sparsity $s$ versus normalized sparsity $\kappa_s$ during the insertion process. (b) Curvature and dilation similarity with insertion process with respect to sparsity $s$ and normalized sparsity $\kappa_s$. In (a), both SSplain methods consistently outperform competitors and competitors give importance to the background. In (b), both at the same $s$ and $\kappa_s$ level, SSplain methods exhibit higher curvature and dilation similarity.
  • Figure 3: Comparison with other explainers for the ROP dataset. From left to right: (a) images, explanation maps generated using (b) SSplain-0, (c) Saliency simonyan2013deep, (d) Input$\times$Gradient shrikumar2016not, (e) Guided Grad-CAM selvaraju2017grad, (f) Occlusion zeiler2014visualizing and (g) Extremal Perturbation fong2019understanding. We visualize only the top 5% of pixels, as ranked by the corresponding explanation maps, coloring the remaining pixels as purple. SSplain effectively preserves image structures and assigns more importance to tortuous and dilated regions. Except Input$\times$Gradient, competitors treat images as a whole and assign importance to pixels that are not associated with vessels.
  • Figure 4: Sanity check adebayo2018sanity for an ROP image. Model weights are progressively randomized from the "fc" layer to the "inception3a.branch3.1.conv" layer in the Inception model szegedy2015going. The first row shows a sample image, original attribution scores, and attribution scores with only the "fc" layer randomized. The second row provides examples from the other layers during progressive randomization. SSplain is sensitive to model weights, i.e., attribution scores change when we randomize the weights as desired. $S_0$ ensures the mask applies only to the vessels, so the mask values for the background do not change.
  • Figure 5: Comparison of explainers: SSplain-0, Saliency simonyan2013deep, Input$\times$Gradient shrikumar2016not, Guided Grad-CAM selvaraju2017grad, Integrated Gradients sundararajan2017axiomatic, DeepSHAP lundberg2017unified, KernelSHAP lundberg2017unified, LIME ribeiro2016should, Occlusion zeiler2014visualizing and Extremal Perturbation fong2019understanding for (a) MNIST and (b) FMNIST datasets. From left to right, we report the average of: Post-hoc accuracy with deletion (lower is better) and insertion (higher is better) of pixels with the highest attribution scores, and sparsity $s$ vs normalized sparsity $\kappa_s$ during the insertion process. For MNIST, SSplain outperforms the competing methods in all analyses. It also stabilizes earlier than other methods, meaning that performance changes become minimal when inserting or deleting pixels in later attempts. For FMNIST, we observe that SSplain performs best in accuracy analysis with the insertion process and in normalized sparsity $\kappa_s$ analysis. Input$\times$Gradient and Occlusion outperform SSplain in deletion analyses. Despite their superior performance in deletion analyses, these methods significantly underperform compared to SSplain in insertion analysis, having lower insertion accuracy.
  • ...and 11 more figures