Table of Contents
Fetching ...

Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder

Matan Atad, David Schinz, Hendrik Moeller, Robert Graf, Benedikt Wiestler, Daniel Rueckert, Nassir Navab, Jan S. Kirschke, Matthias Keicher

TL;DR

This work proposes a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE), offering inherent interpretability by enabling the generation of CEs and the continuous visualization of the model’s internal representation across decision boundaries.

Abstract

Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://doi.org/10.5281/zenodo.13859266.

Counterfactual Explanations for Medical Image Classification and Regression using Diffusion Autoencoder

TL;DR

This work proposes a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE), offering inherent interpretability by enabling the generation of CEs and the continuous visualization of the model’s internal representation across decision boundaries.

Abstract

Counterfactual explanations (CEs) aim to enhance the interpretability of machine learning models by illustrating how alterations in input features would affect the resulting predictions. Common CE approaches require an additional model and are typically constrained to binary counterfactuals. In contrast, we propose a novel method that operates directly on the latent space of a generative model, specifically a Diffusion Autoencoder (DAE). This approach offers inherent interpretability by enabling the generation of CEs and the continuous visualization of the model's internal representation across decision boundaries. Our method leverages the DAE's ability to encode images into a semantically rich latent space in an unsupervised manner, eliminating the need for labeled data or separate feature extraction models. We show that these latent representations are helpful for medical condition classification and the ordinal regression of severity pathologies, such as vertebral compression fractures (VCF) and diabetic retinopathy (DR). Beyond binary CEs, our method supports the visualization of ordinal CEs using a linear model, providing deeper insights into the model's decision-making process and enhancing interpretability. Experiments across various medical imaging datasets demonstrate the method's advantages in interpretability and versatility. The linear manifold of the DAE's latent space allows for meaningful interpolation and manipulation, making it a powerful tool for exploring medical image properties. Our code is available at https://doi.org/10.5281/zenodo.13859266.
Paper Structure (22 sections, 4 equations, 7 figures, 7 tables)

This paper contains 22 sections, 4 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The proposed method involves three steps: 1) unsupervised training of a generative feature extractor Diffusion Autoencoder (DAE), 2) supervised training of a binary classifier to detect a pathology and obtain a decision hyperplane, and 3) calibrating a linear regression of the pathology grade to the hyperplane distance of embedded images. The method inherently enables the generation of counterfactual explanations (CEs), visualizing the model's representation corresponding to regression grades and smooth progressions in between.
  • Figure 2: Qualitative comparison of VCF encoders: The original shape of the evaluated vertebra is highlighted in black, while the reconstructed shape is shown in red. Compared to StyleGAN2 karras2020analyzing with Encoder4Editing (E4E) tov2021designing, the DAE shows the closest resemblance to the original.
  • Figure 3: Images generated by moving the semantic latent orthogonal to the hyperplane without calibration. Top row: healthy vertebra (G0) moved in both directions, revealing a severe fracture on the right. Bottom row: severely fractured vertebra (G3) decompresses on the left and further disintegrates on the right. A hallucination of a lung is added by the model to both images when semantically shifted further into the healthy direction.
  • Figure 4: Interpolation in the latent space. In each row, the original image is in the blue box. The rest show the progression of the pathology, edited by moving the image latent perpendicularly to a hyperplane of a binary classifier in the DAE semantic latent space.
  • Figure 5: DAE image generation calibrated to Genant grades (linear regression to SVM hyperplane): On the left, the results of regression, prediction, and the ground truth (GT) are shown. The first three rows are well-calibrated examples, while the bottom two rows show examples that are not well-calibrated. Note that G1 was not used for training the classifiers, and neither G1 nor G2 was used for calibrating the regressors. The artificial scores -1 and 4 are added for illustrational purposes only.
  • ...and 2 more figures