Table of Contents
Fetching ...

Diffusion Counterfactuals for Image Regressors

Trung Duc Ha, Sidney Bender

TL;DR

This work tackles the challenge of interpreting image regression models by introducing two diffusion-based counterfactual methods. AC-RE operates in pixel space, delivering sparse, localized edits, while Diff-AE-RE operates in latent space to produce higher-quality, semantically meaningful changes; both align counterfactuals with a user-specified reference value $\tilde{y}$ and rely on diffusion-based refinements to stay on the data manifold. Experiments on a synthetic square dataset and CelebA-HQ demonstrate that regression counterfactuals exhibit region-dependent feature changes and reveal spurious correlations, such as age biases linked to accessories like glasses. The study finds a trade-off between sparsity and realism: pixel-space edits are sparser but latent-space edits yield richer semantic changes, informing how to choose methods based on interpretability needs and potential biases in the regressor.

Abstract

Counterfactual explanations have been successfully applied to create human interpretable explanations for various black-box models. They are handy for tasks in the image domain, where the quality of the explanations benefits from recent advances in generative models. Although counterfactual explanations have been widely applied to classification models, their application to regression tasks remains underexplored. We present two methods to create counterfactual explanations for image regression tasks using diffusion-based generative models to address challenges in sparsity and quality: 1) one based on a Denoising Diffusion Probabilistic Model that operates directly in pixel-space and 2) another based on a Diffusion Autoencoder operating in latent space. Both produce realistic, semantic, and smooth counterfactuals on CelebA-HQ and a synthetic data set, providing easily interpretable insights into the decision-making process of the regression model and reveal spurious correlations. We find that for regression counterfactuals, changes in features depend on the region of the predicted value. Large semantic changes are needed for significant changes in predicted values, making it harder to find sparse counterfactuals than with classifiers. Moreover, pixel space counterfactuals are more sparse while latent space counterfactuals are of higher quality and allow bigger semantic changes.

Diffusion Counterfactuals for Image Regressors

TL;DR

This work tackles the challenge of interpreting image regression models by introducing two diffusion-based counterfactual methods. AC-RE operates in pixel space, delivering sparse, localized edits, while Diff-AE-RE operates in latent space to produce higher-quality, semantically meaningful changes; both align counterfactuals with a user-specified reference value and rely on diffusion-based refinements to stay on the data manifold. Experiments on a synthetic square dataset and CelebA-HQ demonstrate that regression counterfactuals exhibit region-dependent feature changes and reveal spurious correlations, such as age biases linked to accessories like glasses. The study finds a trade-off between sparsity and realism: pixel-space edits are sparser but latent-space edits yield richer semantic changes, informing how to choose methods based on interpretability needs and potential biases in the regressor.

Abstract

Counterfactual explanations have been successfully applied to create human interpretable explanations for various black-box models. They are handy for tasks in the image domain, where the quality of the explanations benefits from recent advances in generative models. Although counterfactual explanations have been widely applied to classification models, their application to regression tasks remains underexplored. We present two methods to create counterfactual explanations for image regression tasks using diffusion-based generative models to address challenges in sparsity and quality: 1) one based on a Denoising Diffusion Probabilistic Model that operates directly in pixel-space and 2) another based on a Diffusion Autoencoder operating in latent space. Both produce realistic, semantic, and smooth counterfactuals on CelebA-HQ and a synthetic data set, providing easily interpretable insights into the decision-making process of the regression model and reveal spurious correlations. We find that for regression counterfactuals, changes in features depend on the region of the predicted value. Large semantic changes are needed for significant changes in predicted values, making it harder to find sparse counterfactuals than with classifiers. Moreover, pixel space counterfactuals are more sparse while latent space counterfactuals are of higher quality and allow bigger semantic changes.

Paper Structure

This paper contains 25 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Diff-AE Regression Explanations (Diff-AE-RE). The Diff-AE encodes an image into two distinct vectors: high-level semantics in the latent code $\mathbf z_{\text{sem}}$ and low-level features in the stochastic code $\mathbf x_T$. The adversarial attack creates the gradients that only flow towards the latent code. Diff-AE-RE allows for a smooth transition between the counterfactuals across regression values offering fine-grained inspection of changed features
  • Figure 2: Qualitative Results. The first row shows the input images $\mathbf x$ and their reference values $\tilde{y}$, while the following rows show the CEs of AC-RE and Diff-AE-RE. The captions indicate the extracted color of the square $y$, the regressor prediction $\hat{y}$, or the oracle prediction $\hat{y}_o$. AC-RE creates sparse explanations, while Diff-AE-RE creates broader and more realistic modifications. For the synthetic dataset, AC-RE creates more accurate counterfactuals. For CelebA-HQ, AC-RE primarily changes the textures in the face. In contrast, Diff-AE-RE alters facial shape, skin and teeth color, and accessories. Its oracle score indicates more realistic changes, aligning closely with the reference values
  • Figure 3: Spurious Correlation. We reveal a spurious correlation in the regressor for sample images from the CelebA-HQ validation set. We use the getimg.ai Image Editor ImageEditorGetimgai to add glasses to each person via inpainting. On average, the person appears to be 7 years older than the model, with the first example showing an especially strong change. The effect seems to be more pronounced for male faces
  • Figure 4: Granular reference values. Counterfactual explanations generated by AC-RE and Diff-AE-RE for different reference values $\tilde{y} \in \{10,20,40,60,80\}$ for an image with an initial predicted age of 20. For each algorithm, we show the reconstructed image $\hat{\mathbf x}$ with corresponding counterfactuals input in the first row and input image $\mathbf x$ and heatmaps visualizing pixel-wise differences in the second row. AC-RE produces sparse, subtle changes. In contrast, Diff-AE-RE exhibits more intuitive but less sparse modifications for the lower age. This approach highlights key facial transformations associated with aging as well as a spurious feature
  • Figure 5: Effect of distance function in the latent space with $\lambda_d = 10^{-5}$ We show the input image $\mathbf x$ and the generated counterfactuals with either no, $\ell_1$- and $\ell_2$-distance for the latent codes of the input $\mathbf z_{\text{sem}}$ and counterfactual $\mathbf z_{\text{sem}}'$. Without a distance function, alterations are aggressive, with some non-semantic changes. While the $\ell_2$-distance somewhat limits this, $\ell_1$-distance limits changes to be semantically meaningful while increasing the quality of the explanation