Diffusion Counterfactuals for Image Regressors
Trung Duc Ha, Sidney Bender
TL;DR
This work tackles the challenge of interpreting image regression models by introducing two diffusion-based counterfactual methods. AC-RE operates in pixel space, delivering sparse, localized edits, while Diff-AE-RE operates in latent space to produce higher-quality, semantically meaningful changes; both align counterfactuals with a user-specified reference value $\tilde{y}$ and rely on diffusion-based refinements to stay on the data manifold. Experiments on a synthetic square dataset and CelebA-HQ demonstrate that regression counterfactuals exhibit region-dependent feature changes and reveal spurious correlations, such as age biases linked to accessories like glasses. The study finds a trade-off between sparsity and realism: pixel-space edits are sparser but latent-space edits yield richer semantic changes, informing how to choose methods based on interpretability needs and potential biases in the regressor.
Abstract
Counterfactual explanations have been successfully applied to create human interpretable explanations for various black-box models. They are handy for tasks in the image domain, where the quality of the explanations benefits from recent advances in generative models. Although counterfactual explanations have been widely applied to classification models, their application to regression tasks remains underexplored. We present two methods to create counterfactual explanations for image regression tasks using diffusion-based generative models to address challenges in sparsity and quality: 1) one based on a Denoising Diffusion Probabilistic Model that operates directly in pixel-space and 2) another based on a Diffusion Autoencoder operating in latent space. Both produce realistic, semantic, and smooth counterfactuals on CelebA-HQ and a synthetic data set, providing easily interpretable insights into the decision-making process of the regression model and reveal spurious correlations. We find that for regression counterfactuals, changes in features depend on the region of the predicted value. Large semantic changes are needed for significant changes in predicted values, making it harder to find sparse counterfactuals than with classifiers. Moreover, pixel space counterfactuals are more sparse while latent space counterfactuals are of higher quality and allow bigger semantic changes.
