The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

Anselm Haselhoff; Kevin Trelenberg; Fabian Küppers; Jonas Schneider

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

Anselm Haselhoff, Kevin Trelenberg, Fabian Küppers, Jonas Schneider

TL;DR

The GdVAE is introduced, a self-explainable model based on a conditional variational autoencoder (CVAE), featuring a Gaussian discriminant analysis (GDA) classifier and integrated CF explanations that produce high-quality CF explanations while preserving transparency.

Abstract

Visual counterfactual explanation (CF) methods modify image concepts, e.g, shape, to change a prediction to a predefined outcome while closely resembling the original query image. Unlike self-explainable models (SEMs) and heatmap techniques, they grant users the ability to examine hypothetical "what-if" scenarios. Previous CF methods either entail post-hoc training, limiting the balance between transparency and CF quality, or demand optimization during inference. To bridge the gap between transparent SEMs and CF methods, we introduce the GdVAE, a self-explainable model based on a conditional variational autoencoder (CVAE), featuring a Gaussian discriminant analysis (GDA) classifier and integrated CF explanations. Full transparency is achieved through a generative classifier that leverages class-specific prototypes for the downstream task and a closed-form solution for CFs in the latent space. The consistency of CFs is improved by regularizing the latent space with the explainer function. Extensive comparisons with existing approaches affirm the effectiveness of our method in producing high-quality CF explanations while preserving transparency. Code and models are public.

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

TL;DR

Abstract

Paper Structure (29 sections, 33 equations, 18 figures, 16 tables, 1 algorithm)

This paper contains 29 sections, 33 equations, 18 figures, 16 tables, 1 algorithm.

Introduction
Related Work
Method
Autoencoding and Generative Classification
Counterfactual Explanations (CF)
Experiments
Evaluation of Predictive Performance
Quantitative Evaluation of CF Explanations
Qualitative Evaluation
Conclusion
Limitations and Societal Impacts
Limitations
Societal Impacts
Proofs
Variational Lower Bound of the Joint Log-Likelihood
...and 14 more sections

Figures (18)

Figure 1: FFHQ high-resolution (1024$\times$1024) counterfactuals ${x}^{\delta}$ for smiling.
Figure 2: The GdVAE has three branches: 1.) Feature Detection $\&$ Reconstruction: The encoder, akin to a recognition network in a CVAE, generates latent code ${z}$. During inference, with an unknown class ${y}$, the marginal $q({z}|{x})$ acts as a feature detection module. The decoder reconstructs the input image ${x}$ using samples ${z}^\star$ from the marginal and ${y}^\star$ from the classifier. 2.) Prior Encoder $\&$ Classifier: The prior encoder learns the latent feature distribution independently of the input image, providing necessary distributions for the generative classifier. 3.) Explanation: During inference, the model generates a class prediction ${y}^\star$ and a latent variable ${z}^\star$. The user requests a CF by defining a desired confidence value and uses a linear function ${z}^\delta=\mathcal{I}_f({z}^\star,\delta)$ to modify ${z}^\star$ to $z^\delta$. The CF ${x}^\delta$ is obtained by transforming $z^\delta$ to image space using the decoder. The CF illustrates crossing the decision boundary, showing features of digits 0 and 1.
Figure 3: Regularized latent space. a) Distribution $p_\theta({z} |{y})$ with class-conditional mean values for not-smiling (orange, $\bullet$) and smiling (green, ✦), where ${y}=\bar{s}=not-smiling$ and ${y}=s=smiling$. b) Reconstructed random samples for not-smiling (top, orange ✪) and smiling (bottom, green ★), arranged in ascending order of their Mahalanobis distance from left to right. In each column, the Mahalanobis distance is made consistent by adding the same random vector $\epsilon$ (red vector in a) to the mean of both classes, aligning samples along isocontours. c) The global explainer function interpolates between class-conditional means along the straight-line path (cyan arrow in a).
Figure 4: Left: Counterfactual generation. Right: Counterfactual examples.
Figure 5: Diagram illustrating the GdVAE model components and their interactions, highlighting the key elements that contribute to loss function computation.
...and 13 more figures

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

TL;DR

Abstract

The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations

Authors

TL;DR

Abstract

Table of Contents

Figures (18)