Accurate Explanation Model for Image Classifiers using Class Association Embedding

Ruitao Xie; Jingbang Chen; Limai Jiang; Rui Xiao; Yi Pan; Yunpeng Cai

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai

TL;DR

The paper tackles the challenge of explaining image classifiers by proposing Class Association Embedding (CAE), which disentangles class-associated features from individual variation and embeds them into a low-dimensional, semantically meaningful manifold. A symmetric cycle-GAN framework with an encoder (splitting into class-associated and individual codes) and a decoder learns to reconstruct and swap class features, enabling both global knowledge visualization and instance-level counterfactual explanations. A Building-Block Coherency Feature Extraction (BBCFE) training scheme enforces efficient, coherent class-associated representations, yielding a robust manifold that supports guided counterfactual generation for local explanations. Extensive experiments on five datasets show CAE outperforms state-of-the-art XAI methods in saliency-map accuracy and computation efficiency, while analyses demonstrate semantic pervasiveness, coherency, and smoothness of the learned class-associated space, enabling global understanding of classifier behavior and potential knowledge discovery.

Abstract

Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at https://github.com/xrt11/XAI-CODE.

Accurate Explanation Model for Image Classifiers using Class Association Embedding

TL;DR

Abstract

Paper Structure (24 sections, 12 equations, 11 figures, 6 tables)

This paper contains 24 sections, 12 equations, 11 figures, 6 tables.

Introduction
Related work
Local explanations
Trap problems for local methods
Global explanations
METHODOLOGY
Problem Definition
Class Association Embedding Using a Cyclic Generative Adversarial Network
Learning the Separation of Code Spaces with Building-Block Coherency Feature Extraction
Design of Loss Functions
Local Explanations by Guided Counterfactual Generation on the Class-Associated Manifold
Experiments
Dataset and Experiments Settings
Baseline Methods
Evaluation Metrics
...and 9 more sections

Figures (11)

Figure 1: Illustration of traps for local explaining methods on a two-dimensional plane. The digits on the contour lines shows the classification probabilities and the black bold contour line shows the class-flipping border. ① Gradient-based or single perturbation methods may find misleading explanation directions. ② multi-perturbation methods (including counter-factual generation) tend to get trapped into local optima. ③ Fault-line counter generations, without knowing the global picture, generate far-from-closest paths that lead to false-positive attributions. ④ ⑤ blue arrows show the correct explanation paths.
Figure 2: The overall framework of class association embedding. The recovered sample (generated from the original sample) and the recovered synthetic sample (generated from the synthetic sample) are expected to resemble the original sample, while the synthetic sample (generated from the original sample) is expected to inherit the individual style of the original sample (class $A$) but with the classification features of class $B1$ or $B2$ (which deceives the target classifier).
Figure 3: Illustration of learning for the separation of code spaces using building-block coherency feature extraction method.
Figure 4: The detailed training schema for Building-Block Coherency Feature Extraction.
Figure 5: XAI framework based on class association embedding and generation model. In the class-associated space, codes from samples with different classes are mapped into dots with different colors.
...and 6 more figures

Accurate Explanation Model for Image Classifiers using Class Association Embedding

TL;DR

Abstract

Accurate Explanation Model for Image Classifiers using Class Association Embedding

Authors

TL;DR

Abstract

Table of Contents

Figures (11)