xGEMs: Generating Examplars to Explain Black-Box Models

Shalmali Joshi; Oluwasanmi Koyejo; Been Kim; Joydeep Ghosh

xGEMs: Generating Examplars to Explain Black-Box Models

Shalmali Joshi, Oluwasanmi Koyejo, Been Kim, Joydeep Ghosh

TL;DR

xGEMs introduce manifold-guided exemplars to explain black-box classifiers by traversing a data manifold via an implicit generative proxy. By optimizing counterfactuals on the latent manifold, the approach yields on-manifold, semantically meaningful perturbations that reveal decision boundary behavior, biases, and training progression. The framework enables automated bias detection using a confounding metric and provides complementary insights to calibration analyses and reliability diagrams. While powerful, it relies on the quality of the manifold proxy and invites future work on diverse data domains and generator architectures.

Abstract

This work proposes xGEMs or manifold guided exemplars, a framework to understand black-box classifier behavior by exploring the landscape of the underlying data manifold as data points cross decision boundaries. To do so, we train an unsupervised implicit generative model -- treated as a proxy to the data manifold. We summarize black-box model behavior quantitatively by perturbing data samples along the manifold. We demonstrate xGEMs' ability to detect and quantify bias in model learning and also for understanding the changes in model behavior as training progresses.

xGEMs: Generating Examplars to Explain Black-Box Models

TL;DR

Abstract

xGEMs: Generating Examplars to Explain Black-Box Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)

Theorems & Definitions (1)