Table of Contents
Fetching ...

Fill in the blanks: Rethinking Interpretability in vision

Pathirage N. Deelaka, Tharindu Wickremasinghe, Devin Y. De Silva, Lisara N. Gajaweera

TL;DR

This work re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training, and asks the question:"How would a vision model fill-in a masked-image".

Abstract

Model interpretability is a key challenge that has yet to align with the advancements observed in contemporary state-of-the-art deep learning models. In particular, deep learning aided vision tasks require interpretability, in order for their adoption in more specialized domains such as medical imaging. Although the field of explainable AI (XAI) developed methods for interpreting vision models along with early convolutional neural networks, recent XAI research has mainly focused on assigning attributes via saliency maps. As such, these methods are restricted to providing explanations at a sample level, and many explainability methods suffer from low adaptability across a wide range of vision models. In our work, we re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training. To this end, we ask the question: "How would a vision model fill-in a masked-image". Experiments on standard vision datasets and pre-trained models reveal consistent patterns, and could be intergrated as an additional model-agnostic explainability tool in modern machine-learning platforms. The code will be available at \url{https://github.com/BoTZ-TND/FillingTheBlanks.git}

Fill in the blanks: Rethinking Interpretability in vision

TL;DR

This work re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training, and asks the question:"How would a vision model fill-in a masked-image".

Abstract

Model interpretability is a key challenge that has yet to align with the advancements observed in contemporary state-of-the-art deep learning models. In particular, deep learning aided vision tasks require interpretability, in order for their adoption in more specialized domains such as medical imaging. Although the field of explainable AI (XAI) developed methods for interpreting vision models along with early convolutional neural networks, recent XAI research has mainly focused on assigning attributes via saliency maps. As such, these methods are restricted to providing explanations at a sample level, and many explainability methods suffer from low adaptability across a wide range of vision models. In our work, we re-think vision-model explainability from a novel perspective, to probe the general input structure that a model has learnt during its training. To this end, we ask the question: "How would a vision model fill-in a masked-image". Experiments on standard vision datasets and pre-trained models reveal consistent patterns, and could be intergrated as an additional model-agnostic explainability tool in modern machine-learning platforms. The code will be available at \url{https://github.com/BoTZ-TND/FillingTheBlanks.git}

Paper Structure

This paper contains 15 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Evaluating a simple CNN classifier trained on MNISTlecun2010mnist. Mask $M$ is applied on image $\mathbf{x}$ of the digit 3. The positive (red) and negative (blue) components of the update term $\widetilde{M}(\mathbf{x})$ is shown superimposed with $M(\mathbf{x})$. The thresholded update $T( \Delta \mathbf{x} )$ 'fills in the blanks' of the masked input.
  • Figure 2: Visualising approximations of $\mathbb{E} [ \Delta \mathbf{x} ]$ for digits of (a) MNISTlecun2010mnist and (b) Fashion MNISTXiao2017FashionMNISTAN data (section 4.1). Observe that the prototypical image of each class is visually convincing of distinctive features of that class.
  • Figure 3: Visualising the predictions of progressively masking different regions of the image (section 4.2). The sequence $\mathcal{M}$ in this experiment has 4 non-overlapping masks, each masking 25% of the pixels. Progressively the prediction of each mask is used to generate an update for all pixels.
  • Figure 4: Visual ablation of the choice of mask ratio $\eta$ and the patch size of each mask.
  • Figure 5: Comparing how the model "fills in the blanks" from a first order gradient update term $\Delta \mathbf{x}_1$, with the predictions from the second order direction vector $\Delta \mathbf{x}_2$.
  • ...and 1 more figures