Table of Contents
Fetching ...

Interpret the Predictions of Deep Networks via Re-Label Distillation

Yingying Hua, Shiming Ge, Daichi Zhang

TL;DR

This work addresses local interpretability of deep network predictions by learning a simple linear model that maps images directly to predictions through self-supervised synthetic data generated by a VAE. By relabeling these synthetic samples with respect to whether the teacher's predictions shift, and distilling both soft and hard information from the teacher, the approach yields a transparent model that highlights salient features contributing to a given prediction. The method demonstrates stronger, more precise saliency explanations than several baselines on ImageNet with standard architectures, validated by both qualitative saliency maps and quantitative deletion/insertion metrics. Overall, re-label distillation provides a practical pathway to explainability with an interpretable, boundary-aware representation of the DNN's decision process.

Abstract

Interpreting the predictions of a black-box deep network can facilitate the reliability of its deployment. In this work, we propose a re-label distillation approach to learn a direct map from the input to the prediction in a self-supervision manner. The image is projected into a VAE subspace to generate some synthetic images by randomly perturbing its latent vector. Then, these synthetic images can be annotated into one of two classes by identifying whether their labels shift. After that, using the labels annotated by the deep network as teacher, a linear student model is trained to approximate the annotations by mapping these synthetic images to the classes. In this manner, these re-labeled synthetic images can well describe the local classification mechanism of the deep network, and the learned student can provide a more intuitive explanation towards the predictions. Extensive experiments verify the effectiveness of our approach qualitatively and quantitatively.

Interpret the Predictions of Deep Networks via Re-Label Distillation

TL;DR

This work addresses local interpretability of deep network predictions by learning a simple linear model that maps images directly to predictions through self-supervised synthetic data generated by a VAE. By relabeling these synthetic samples with respect to whether the teacher's predictions shift, and distilling both soft and hard information from the teacher, the approach yields a transparent model that highlights salient features contributing to a given prediction. The method demonstrates stronger, more precise saliency explanations than several baselines on ImageNet with standard architectures, validated by both qualitative saliency maps and quantitative deletion/insertion metrics. Overall, re-label distillation provides a practical pathway to explainability with an interpretable, boundary-aware representation of the DNN's decision process.

Abstract

Interpreting the predictions of a black-box deep network can facilitate the reliability of its deployment. In this work, we propose a re-label distillation approach to learn a direct map from the input to the prediction in a self-supervision manner. The image is projected into a VAE subspace to generate some synthetic images by randomly perturbing its latent vector. Then, these synthetic images can be annotated into one of two classes by identifying whether their labels shift. After that, using the labels annotated by the deep network as teacher, a linear student model is trained to approximate the annotations by mapping these synthetic images to the classes. In this manner, these re-labeled synthetic images can well describe the local classification mechanism of the deep network, and the learned student can provide a more intuitive explanation towards the predictions. Extensive experiments verify the effectiveness of our approach qualitatively and quantitatively.
Paper Structure (10 sections, 7 equations, 6 figures, 1 table)

This paper contains 10 sections, 7 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Motivation of our approach. To interpret the prediction of a specific image, we generate some synthetic images to deconstruct the hidden knowledge of DNNs. And the synthetic images can be well projected on both sides of the classification boundary in the label domain, which can represent the classification knowledge to interpret its prediction.
  • Figure 2: Overview of our approach. For a given image, we first use a pre-trained VAE to generate some synthetic images by perturbing the latent vector with random noise. Then, we re-label these synthetic images through a pre-trained CNN into one of two classed by identifying whether their predictions shift. Finally, we train a two-class linear model by distilling the soft logits from CNN with these re-labeled synthetic images. Therefore, the weights of the trained linear model can mark the location of the important features contributed to its prediction, which could generate a saliency map to interpret the prediction of the image.
  • Figure 3: An example of t-SNE JMLR:v9:vandermaaten08a plots. Points are colored by their reconstructed labels. The plots indicate that the synthetic images are sortable and could be projected into the label domain to represent the boundary knowledge of deep networks.
  • Figure 4: The qualitative comparisons with state-of-the-art methods, including (a) Linear approximation, (b) RISE petsiuk2018rise, (c) Excitation backprop 2018Top, (d) Extremal perturbations fong2019understanding, (e) Grad-CAM selvaraju2020grad, (f) Score-CAM 2020Score, (g) Occlusion sensitivity 10.1007/978-3-319-10590-1_53, (h) Re-Label distillation (ours). Our saliency maps mark the target area of the image more accurately and show the degree of importance to the predictions more clearly, which means our method generates a better visual explanation than others.
  • Figure 5: Quantitative results on ResNet50 and VGG16. With the deletion (left) or insertion (right) of the salient features, the obvious change in class probability validates the significance of these features contributed to the models' predictions.
  • ...and 1 more figures