Table of Contents
Fetching ...

Peter Parker or Spiderman? Disambiguating Multiple Class Labels

Nuthan Mummani, Simran Ketha, Venkatakrishnan Ramaswamy

TL;DR

This work presents a framework and method to disambiguate pair of predictions in the supervised classification setting, leveraging modern segmentation and input attribution techniques, and provides a simple counterfactual"proof" of each case.

Abstract

In the supervised classification setting, during inference, deep networks typically make multiple predictions. For a pair of such predictions (that are in the top-k predictions), two distinct possibilities might occur. On the one hand, each of the two predictions might be primarily driven by two distinct sets of entities in the input. On the other hand, it is possible that there is a single entity or set of entities that is driving the prediction for both the classes in question. This latter case, in effect, corresponds to the network making two separate guesses about the identity of a single entity type. Clearly, both the guesses cannot be true, i.e. both the labels cannot be present in the input. Current techniques in interpretability research do not readily disambiguate these two cases, since they typically consider input attributions for one class label at a time. Here, we present a framework and method to do so, leveraging modern segmentation and input attribution techniques. Notably, our framework also provides a simple counterfactual "proof" of each case, which can be verified for the input on the model (i.e. without running the method again). We demonstrate that the method performs well for a number of samples from the ImageNet validation set and on multiple models.

Peter Parker or Spiderman? Disambiguating Multiple Class Labels

TL;DR

This work presents a framework and method to disambiguate pair of predictions in the supervised classification setting, leveraging modern segmentation and input attribution techniques, and provides a simple counterfactual"proof" of each case.

Abstract

In the supervised classification setting, during inference, deep networks typically make multiple predictions. For a pair of such predictions (that are in the top-k predictions), two distinct possibilities might occur. On the one hand, each of the two predictions might be primarily driven by two distinct sets of entities in the input. On the other hand, it is possible that there is a single entity or set of entities that is driving the prediction for both the classes in question. This latter case, in effect, corresponds to the network making two separate guesses about the identity of a single entity type. Clearly, both the guesses cannot be true, i.e. both the labels cannot be present in the input. Current techniques in interpretability research do not readily disambiguate these two cases, since they typically consider input attributions for one class label at a time. Here, we present a framework and method to do so, leveraging modern segmentation and input attribution techniques. Notably, our framework also provides a simple counterfactual "proof" of each case, which can be verified for the input on the model (i.e. without running the method again). We demonstrate that the method performs well for a number of samples from the ImageNet validation set and on multiple models.

Paper Structure

This paper contains 15 sections, 8 figures.

Figures (8)

  • Figure 1: An illustration of rank-based redaction. A. An image from the ImageNet validation set from the class vizsla is padded with zeros to match the input dimension of VGG16 model to obtain the image shown. Corresponding top-3 predictions are mentioned. B. The image in A. is attributed to the label vizsla using integrated gradients to obtain pixelwise attribution values. C. The image in A. is segmented using the SAM model. D. The pixel-wise attribution values from B. are averaged over the segments and these segments are ranked accordingly to get segment-wise attributions for the label vizsla. E. Top 25% of the ranked segments are then redacted to get an $S$-redaction. Corresponding top-3 predictions for this $S$-redacted image are mentioned. The prediction for vizsla on this $S$-redacted image dropped to 0.010. This process on the same image with ResNet-50 and Inception-v3 are shown in Figure \ref{['overview_resnet_inception']}.
  • Figure 2: Example illustrating $\delta$-disjoint attributions. A. An image from the ImageNet validation set & its corresponding top-2 labels with their predictions on VGG-16 model. B. For $\delta=0.2$, $\delta$-attribution for the label baseball (indicated in yellow) obtained using the algorithm discussed in Section \ref{['type1']}. C. The corresponding redacted image for the label baseball with the resultant prediction values. D. For $\delta=0.2$, $\delta$-attribution for the label ballplayer (indicated in yellow) obtained using the algorithm discussed in Section \ref{['type1']}. E. The corresponding redacted image for the label ballplayer with the resultant prediction values. For both labels, percentage of softmax prediction values while redacting segments with respect to the original image are plotted. F. For the algorithm discussed in Section \ref{['type1']}, we plotted the percentage change in prediction for the two labels, when the segments ranked for the label baseball were successively redacted in order of their rank. Here, the $\delta=0.2$ attribution is obtained at the 19th redaction ( red-dotted line) where prediction of baseball is atmost $\delta$*$p_1$ ($0.098 < 0.2*0.513$) and prediction of ballplayer is atleast (1-$\delta$)$p_2$ ($0.797 > 0.8*0.484$). G. Corresponding plot for ballplayer. Additional examples are provided in Appendix \ref{['additional_examples']}.
  • Figure 3: Example illustrating $\delta$-overlapping attributions. A. An image from the ImageNet validation set (whose correct label from the validation set is Rhodesian_ridgeback) and its corresponding top-2 labels with their predictions values on VGG-16. B. For $\delta=0.2$, $\delta$-attribution for the label Rhodesian_ridgeback (indicated in yellow) obtained using the algorithm discussed in Section \ref{['type2']}. C. The corresponding redacted image for the label Rhodesian_ridgeback with the resultant prediction values. D. For $\delta=0.2$, $\delta$-attribution for the label Labrador_retriever (indicated in yellow) obtained using the algorithm discussed in Section \ref{['type2']}. E. The corresponding redacted image for the label ballplayer with the resultant prediction values. F. For $\delta=0.2$ , $S_1 \cap S_2$ is verified to satisfy Definition \ref{['type2def']}. G. The corresponding $S_1 \cap S_2$-redacted image with the resultant prediction values. For both labels, percentage of softmax prediction values while redacting segments with respect to the original image are plotted. H. For the algorithm discussed in Section \ref{['type2']}, we plotted the percentage change in prediction for the two labels, when the segments ranked for the label Rhodesian_ridgeback were successively redacted in order of their rank. Here $\delta=0.2$ attribution is obtained at the 9th redaction ( red dotted line) where prediction of Rhodesian_ridgeback is atmost $\delta$*$p_1$ ($0.0008 < 0.2*0.6275$) and prediction of Labrador_retriever is also atmost $\delta$*$p_2$ ($0.0135 < 0.2*0.1596$). I. Corresponding plot for the label Labrador_retriever. Additional examples are provided in Appendix \ref{['additional_examples']}.
  • Figure 4: Illustration of rank-based redaction for different models ( Top row : ResNet-50 , Bottom row : Inception-v3) using the same image and following a similar pipeline as Figure \ref{['fig1']}.
  • Figure 5: Example illustrating Algorithm 3 as a verifier for Definition \ref{['type2def']}. Here we take a sample image which satisfies the definition of $\delta$-disjoint attributions for a pair of classes. We then suppose $S=S_1 \cup S_2$ and challenge the verifier by offering $S$ as a certificate for $\delta$-overlapping attributions. We demonstrate that the verifier indeed flags $S$ as an incorrect certificate for $\delta$-overlapping attributions. A. An image with labels aircraft_carrier and projectile which satisfies Definition \ref{['type1def']} on ResNet-50 model. B&C. $S=S_1 \cup S_2$ acting as a purported certificate for Definition \ref{['type2def']}, with $\delta=0.2$, for both aircraft_carrier and projectile classes. D, E & F. Algorithm \ref{['type1algo3']} breaks the set $S$ to become $S_1, S_1 \cap S_2, S_2$ in D, E & F respectively. G & H. $S_1$ and $S_2$ redaction images formed using D & F respectively. G & H together satisfies Definition \ref{['type1def']} with $\delta=0.2$ and hence the verifier rejects the certificate $S$ for Definition \ref{['type2def']}.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2: $S$-redaction
  • Definition 3: $\delta$-attribution
  • Definition 4: $\delta$-disjoint label predictions
  • Definition 5: $\delta$-overlapping label predictions