Accurate Explanation Model for Image Classifiers using Class Association Embedding
Ruitao Xie, Jingbang Chen, Limai Jiang, Rui Xiao, Yi Pan, Yunpeng Cai
TL;DR
The paper tackles the challenge of explaining image classifiers by proposing Class Association Embedding (CAE), which disentangles class-associated features from individual variation and embeds them into a low-dimensional, semantically meaningful manifold. A symmetric cycle-GAN framework with an encoder (splitting into class-associated and individual codes) and a decoder learns to reconstruct and swap class features, enabling both global knowledge visualization and instance-level counterfactual explanations. A Building-Block Coherency Feature Extraction (BBCFE) training scheme enforces efficient, coherent class-associated representations, yielding a robust manifold that supports guided counterfactual generation for local explanations. Extensive experiments on five datasets show CAE outperforms state-of-the-art XAI methods in saliency-map accuracy and computation efficiency, while analyses demonstrate semantic pervasiveness, coherency, and smoothness of the learned class-associated space, enabling global understanding of classifier behavior and potential knowledge discovery.
Abstract
Image classification is a primary task in data analysis where explainable models are crucially demanded in various applications. Although amounts of methods have been proposed to obtain explainable knowledge from the black-box classifiers, these approaches lack the efficiency of extracting global knowledge regarding the classification task, thus is vulnerable to local traps and often leads to poor accuracy. In this study, we propose a generative explanation model that combines the advantages of global and local knowledge for explaining image classifiers. We develop a representation learning method called class association embedding (CAE), which encodes each sample into a pair of separated class-associated and individual codes. Recombining the individual code of a given sample with altered class-associated code leads to a synthetic real-looking sample with preserved individual characters but modified class-associated features and possibly flipped class assignments. A building-block coherency feature extraction algorithm is proposed that efficiently separates class-associated features from individual ones. The extracted feature space forms a low-dimensional manifold that visualizes the classification decision patterns. Explanation on each individual sample can be then achieved in a counter-factual generation manner which continuously modifies the sample in one direction, by shifting its class-associated code along a guided path, until its classification outcome is changed. We compare our method with state-of-the-art ones on explaining image classification tasks in the form of saliency maps, demonstrating that our method achieves higher accuracies. The code is available at https://github.com/xrt11/XAI-CODE.
