Table of Contents
Fetching ...

V-CECE: Visual Counterfactual Explanations via Conceptual Edits

Nikolaos Spanos, Maria Lymperaiou, Giorgos Filandrianos, Konstantinos Thomas, Athanasios Voulodimos, Giorgos Stamou

TL;DR

V-CECE introduces a model-agnostic, training-free pipeline for visual counterfactual explanations that emphasizes human-understandable semantic edits. It splits the task into a guaranteed-optimal, concept-based edit discovery and a diffusion-based image generation stage that applies those edits as counterfactuals. By evaluating across CNNs, ViTs, and LVLMs, the work demonstrates a pronounced semantic gap between human reasoning and non-LVLM classifiers, with LVLMs achieving closer alignment to human semantics. The framework highlights biases in classifiers, provides actionable explanations, and offers a practical plug-and-play tool for testing semantic understanding in visual classifiers.

Abstract

Recent black-box counterfactual generation frameworks fail to take into account the semantic content of the proposed edits, while relying heavily on training to guide the generation process. We propose a novel, plug-and-play black-box counterfactual generation framework, which suggests step-by-step edits based on theoretical guarantees of optimal edits to produce human-level counterfactual explanations with zero training. Our framework utilizes a pre-trained image editing diffusion model, and operates without access to the internals of the classifier, leading to an explainable counterfactual generation process. Throughout our experimentation, we showcase the explanatory gap between human reasoning and neural model behavior by utilizing both Convolutional Neural Network (CNN), Vision Transformer (ViT) and Large Vision Language Model (LVLM) classifiers, substantiated through a comprehensive human evaluation.

V-CECE: Visual Counterfactual Explanations via Conceptual Edits

TL;DR

V-CECE introduces a model-agnostic, training-free pipeline for visual counterfactual explanations that emphasizes human-understandable semantic edits. It splits the task into a guaranteed-optimal, concept-based edit discovery and a diffusion-based image generation stage that applies those edits as counterfactuals. By evaluating across CNNs, ViTs, and LVLMs, the work demonstrates a pronounced semantic gap between human reasoning and non-LVLM classifiers, with LVLMs achieving closer alignment to human semantics. The framework highlights biases in classifiers, provides actionable explanations, and offers a practical plug-and-play tool for testing semantic understanding in visual classifiers.

Abstract

Recent black-box counterfactual generation frameworks fail to take into account the semantic content of the proposed edits, while relying heavily on training to guide the generation process. We propose a novel, plug-and-play black-box counterfactual generation framework, which suggests step-by-step edits based on theoretical guarantees of optimal edits to produce human-level counterfactual explanations with zero training. Our framework utilizes a pre-trained image editing diffusion model, and operates without access to the internals of the classifier, leading to an explainable counterfactual generation process. Throughout our experimentation, we showcase the explanatory gap between human reasoning and neural model behavior by utilizing both Convolutional Neural Network (CNN), Vision Transformer (ViT) and Large Vision Language Model (LVLM) classifiers, substantiated through a comprehensive human evaluation.

Paper Structure

This paper contains 33 sections, 2 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Outline of V-CECE to address the explanatory gap between humans and models. The semantic edit framework reviews the image and proposes edits and their iterative sequence. The edits are then implemented through a combined object recognition and diffusion model. The edited images are reviewed from the respective models to ascertain whether or not the edit had the desired effect. The edited images are evaluated through visual metrics, counterfactual metrics and a human survey.
  • Figure 2: An example of counterfactual generation using the global-local method for two different classifiers on the same input image, including the source image, the intermediate image generated from the edit in step 1 (removal of the car), and the final counterfactual image that prompted a label change. In the case of the LVLM, the counterfactual image is the same as the image from step 1.
  • Figure 3: Ambiguity in different LVLMs (left) and CNNs (right) across different stages of the counterfactual generation process.
  • Figure 4: Successful generations after 2 steps of edits for DenseNet classifier. The red arrow denotes the step at which humans perceive label-flipping. In the presented case, DenseNet flips label concurrently with humans and generation terminates.
  • Figure 5: Successful generations after 3 steps of edits for DenseNet classifier. The red arrow denotes the step at which humans perceive label-flipping. In the presented case, DenseNet flips label concurrently with humans and generation terminates.
  • ...and 8 more figures