Table of Contents
Fetching ...

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

Yingying Fang, Shuang Wu, Zihao Jin, Caiwen Xu, Shiyi Wang, Simon Walsh, Guang Yang

TL;DR

This work proposes an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model, and demonstrates its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods.

Abstract

In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods encounter challenges in identifying decisive features in medical image classifications, especially when discriminative features are subtle or not immediately evident. To address this limitation, we propose an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model. By employing this agent model, we can uncover influential image patterns that impact the black model's final predictions. Through our methodology, we efficiently identify features that influence decisions of the deep black box. We validated our approach in the rigorous domain of medical prognosis tasks, showcasing its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods. The code will be publicly available at https://github.com/ayanglab/DiffExplainer.

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

TL;DR

This work proposes an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model, and demonstrates its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods.

Abstract

In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods encounter challenges in identifying decisive features in medical image classifications, especially when discriminative features are subtle or not immediately evident. To address this limitation, we propose an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model. By employing this agent model, we can uncover influential image patterns that impact the black model's final predictions. Through our methodology, we efficiently identify features that influence decisions of the deep black box. We validated our approach in the rigorous domain of medical prognosis tasks, showcasing its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods. The code will be publicly available at https://github.com/ayanglab/DiffExplainer.
Paper Structure (18 sections, 2 equations, 3 figures, 1 table)

This paper contains 18 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Framework of DiffExplainer. (A) The workflow of using DiffExplainer to perform feature manipulation and counterfactual generation to understand the impact of different regions in affecting the teacher model's predictions. The two key components are: (B) Diffusion autoencoder consisting of an encoder and a DDIM generative decoder song2020denoising; (C) Knowledge distillation for aligning the latent feature from the diffusion autoencoder to that of the given black box.
  • Figure 2: Counterfactual generation for 'hard' cases that the pretrained classifier failed to assign confident predictions. Row 1: Counterfactuals with increasing 'Death score'; Row 2: Counterfactuals with increasing 'Survival score'.
  • Figure 3: Comparison to other XAI methods. Blue and red areas indicate the existing features and missing features that contributes to the change in prediction.