Table of Contents
Fetching ...

CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations

Franz Motzkus, Christian Hellert, Ute Schmid

TL;DR

CoLa-DCE tackles the lack of transparency in diffusion-based counterfactual explanations by introducing concept-guided latent diffusion with local target selection and spatial conditioning. It selects local counterfactual targets based on the model's perception relative to a reference set and constrains feature changes to a small set of semantic concepts with spatial localization, improving minimality and comprehensibility. The approach yields semantically meaningful, localized changes that facilitate debugging of misclassifications and understanding of model failures. Evaluations on ImageNet across multiple architectures demonstrate favorable trade-offs between perceptual similarity (FID), semantic distance (L1/L2), and counterfactual fidelity (flip ratio), with qualitative demonstrations of interpretable feature-level modifications.

Abstract

Recent advancements in generative AI have introduced novel prospects and practical implementations. Especially diffusion models show their strength in generating diverse and, at the same time, realistic features, positioning them well for generating counterfactual explanations for computer vision models. Answering "what if" questions of what needs to change to make an image classifier change its prediction, counterfactual explanations align well with human understanding and consequently help in making model behavior more comprehensible. Current methods succeed in generating authentic counterfactuals, but lack transparency as feature changes are not directly perceivable. To address this limitation, we introduce Concept-guided Latent Diffusion Counterfactual Explanations (CoLa-DCE). CoLa-DCE generates concept-guided counterfactuals for any classifier with a high degree of control regarding concept selection and spatial conditioning. The counterfactuals comprise an increased granularity through minimal feature changes. The reference feature visualization ensures better comprehensibility, while the feature localization provides increased transparency of "where" changed "what". We demonstrate the advantages of our approach in minimality and comprehensibility across multiple image classification models and datasets and provide insights into how our CoLa-DCE explanations help comprehend model errors like misclassification cases.

CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations

TL;DR

CoLa-DCE tackles the lack of transparency in diffusion-based counterfactual explanations by introducing concept-guided latent diffusion with local target selection and spatial conditioning. It selects local counterfactual targets based on the model's perception relative to a reference set and constrains feature changes to a small set of semantic concepts with spatial localization, improving minimality and comprehensibility. The approach yields semantically meaningful, localized changes that facilitate debugging of misclassifications and understanding of model failures. Evaluations on ImageNet across multiple architectures demonstrate favorable trade-offs between perceptual similarity (FID), semantic distance (L1/L2), and counterfactual fidelity (flip ratio), with qualitative demonstrations of interpretable feature-level modifications.

Abstract

Recent advancements in generative AI have introduced novel prospects and practical implementations. Especially diffusion models show their strength in generating diverse and, at the same time, realistic features, positioning them well for generating counterfactual explanations for computer vision models. Answering "what if" questions of what needs to change to make an image classifier change its prediction, counterfactual explanations align well with human understanding and consequently help in making model behavior more comprehensible. Current methods succeed in generating authentic counterfactuals, but lack transparency as feature changes are not directly perceivable. To address this limitation, we introduce Concept-guided Latent Diffusion Counterfactual Explanations (CoLa-DCE). CoLa-DCE generates concept-guided counterfactuals for any classifier with a high degree of control regarding concept selection and spatial conditioning. The counterfactuals comprise an increased granularity through minimal feature changes. The reference feature visualization ensures better comprehensibility, while the feature localization provides increased transparency of "where" changed "what". We demonstrate the advantages of our approach in minimality and comprehensibility across multiple image classification models and datasets and provide insights into how our CoLa-DCE explanations help comprehend model errors like misclassification cases.
Paper Structure (26 sections, 7 equations, 15 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 7 equations, 15 figures, 3 tables, 1 algorithm.

Figures (15)

  • Figure 1: Example image of a concept-based counterfactual with consisting of a selection of concepts with reference samples, a localization map per concept indicating the concept regions, and the generated counterfactual.
  • Figure 2: A simplified overview of the model architecture for our approach, including the target selection (right) and the concept-conditioning for guiding the diffusion denoising (middle).
  • Figure 3: Quantitative evaluation for specifying the tradeoff between the number of concepts and the quantitative measures as flip ratio and FID. The results in \ref{['fig:num_concepts1']} are derived for the VGG16bn with target layer feat.40.
  • Figure 4: CoLa-DCE explanations ("water ouzel" to "red-backed sandpiper") with a differing number of concepts $k$ and and the VGG16bn with concept layer 40. Limiting the concept number induces more fine-grained feature perturbations than the baseline , flipping the shown bird completely.
  • Figure 5: Comparison of the counterfactual images and their explanations for and our proposed method w/o and with spatial constraints.
  • ...and 10 more figures