DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

Yingying Fang; Shuang Wu; Zihao Jin; Caiwen Xu; Shiyi Wang; Simon Walsh; Guang Yang

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

Yingying Fang, Shuang Wu, Zihao Jin, Caiwen Xu, Shiyi Wang, Simon Walsh, Guang Yang

TL;DR

This work proposes an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model, and demonstrates its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods.

Abstract

In the field of medical imaging, particularly in tasks related to early disease detection and prognosis, understanding the reasoning behind AI model predictions is imperative for assessing their reliability. Conventional explanation methods encounter challenges in identifying decisive features in medical image classifications, especially when discriminative features are subtle or not immediately evident. To address this limitation, we propose an agent model capable of generating counterfactual images that prompt different decisions when plugged into a black box model. By employing this agent model, we can uncover influential image patterns that impact the black model's final predictions. Through our methodology, we efficiently identify features that influence decisions of the deep black box. We validated our approach in the rigorous domain of medical prognosis tasks, showcasing its efficacy and potential to enhance the reliability of deep learning models in medical image classification compared to existing interpretation methods. The code will be publicly available at https://github.com/ayanglab/DiffExplainer.

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

TL;DR

Abstract

Paper Structure (18 sections, 2 equations, 3 figures, 1 table)

This paper contains 18 sections, 2 equations, 3 figures, 1 table.

Introduction
Method
Diffusion Autoencoder
Teacher-Student Learning for Agent Classifier
Counterfactual Generation
Evaluation criteria for counterfactual generation
Experiments
Dataset
Experimental setting
Validity of the Agent Model
Performance of the counterfactual generation
The Diffusion autoencoder achieves the best reconstruction quality while GAN-based models fails to achieve acceptable reconstruction.
DiffExplainer consistently pinpoints observable features upon which the decision is based.
DiffExplainer allows for fine-grained control over the counterfactual generation, enabling smooth transition from one classification result to another.
Comparison to other XAI methods
...and 3 more sections

Figures (3)

Figure 1: Framework of DiffExplainer. (A) The workflow of using DiffExplainer to perform feature manipulation and counterfactual generation to understand the impact of different regions in affecting the teacher model's predictions. The two key components are: (B) Diffusion autoencoder consisting of an encoder and a DDIM generative decoder song2020denoising; (C) Knowledge distillation for aligning the latent feature from the diffusion autoencoder to that of the given black box.
Figure 2: Counterfactual generation for 'hard' cases that the pretrained classifier failed to assign confident predictions. Row 1: Counterfactuals with increasing 'Death score'; Row 2: Counterfactuals with increasing 'Survival score'.
Figure 3: Comparison to other XAI methods. Blue and red areas indicate the existing features and missing features that contributes to the change in prediction.

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

TL;DR

Abstract

DiffExplainer: Unveiling Black Box Models Via Counterfactual Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)