DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering

Suparshva Jain; Amit Sangroya; Lovekesh Vig

DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering

Suparshva Jain, Amit Sangroya, Lovekesh Vig

TL;DR

This work tackles the problem of generating multiple counterfactual explanations for classes that contain multiple modes. It introduces DifCluE, a pipeline that uses a diffusion autoencoder to learn a rich latent representation and clusters semantic latents to identify intra-class modes, enabling mode-specific counterfactual generation by perturbing along learned directions. Key contributions include a scalable approach that avoids retraining when varying the number of counterfactuals and empirical evidence showing improved mode coverage and realism compared with prior methods like DISSECT. The method enhances interpretability by producing concept-level perturbations and demonstrates strong performance on FFHQ/CelebA benchmarks, with improved realism and disentanglement. This approach offers a more trustworthy, human-understandable explanation mechanism for multimodal classes in complex datasets.

Abstract

Generating multiple counterfactual explanations for different modes within a class presents a significant challenge, as these modes are distinct yet converge under the same classification. Diffusion probabilistic models (DPMs) have demonstrated a strong ability to capture the underlying modes of data distributions. In this paper, we harness the power of a Diffusion Autoencoder to generate multiple distinct counterfactual explanations. By clustering in the latent space, we uncover the directions corresponding to the different modes within a class, enabling the generation of diverse and meaningful counterfactuals. We introduce a novel methodology, DifCluE, which consistently identifies these modes and produces more reliable counterfactual explanations. Our experimental results demonstrate that DifCluE outperforms the current state-of-the-art in generating multiple counterfactual explanations, offering a significant advancement in model interpretability.

DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering

TL;DR

Abstract

DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)