CoLa-DCE -- Concept-guided Latent Diffusion Counterfactual Explanations
Franz Motzkus, Christian Hellert, Ute Schmid
TL;DR
CoLa-DCE tackles the lack of transparency in diffusion-based counterfactual explanations by introducing concept-guided latent diffusion with local target selection and spatial conditioning. It selects local counterfactual targets based on the model's perception relative to a reference set and constrains feature changes to a small set of semantic concepts with spatial localization, improving minimality and comprehensibility. The approach yields semantically meaningful, localized changes that facilitate debugging of misclassifications and understanding of model failures. Evaluations on ImageNet across multiple architectures demonstrate favorable trade-offs between perceptual similarity (FID), semantic distance (L1/L2), and counterfactual fidelity (flip ratio), with qualitative demonstrations of interpretable feature-level modifications.
Abstract
Recent advancements in generative AI have introduced novel prospects and practical implementations. Especially diffusion models show their strength in generating diverse and, at the same time, realistic features, positioning them well for generating counterfactual explanations for computer vision models. Answering "what if" questions of what needs to change to make an image classifier change its prediction, counterfactual explanations align well with human understanding and consequently help in making model behavior more comprehensible. Current methods succeed in generating authentic counterfactuals, but lack transparency as feature changes are not directly perceivable. To address this limitation, we introduce Concept-guided Latent Diffusion Counterfactual Explanations (CoLa-DCE). CoLa-DCE generates concept-guided counterfactuals for any classifier with a high degree of control regarding concept selection and spatial conditioning. The counterfactuals comprise an increased granularity through minimal feature changes. The reference feature visualization ensures better comprehensibility, while the feature localization provides increased transparency of "where" changed "what". We demonstrate the advantages of our approach in minimality and comprehensibility across multiple image classification models and datasets and provide insights into how our CoLa-DCE explanations help comprehend model errors like misclassification cases.
