Table of Contents
Fetching ...

xGEMs: Generating Examplars to Explain Black-Box Models

Shalmali Joshi, Oluwasanmi Koyejo, Been Kim, Joydeep Ghosh

TL;DR

xGEMs introduce manifold-guided exemplars to explain black-box classifiers by traversing a data manifold via an implicit generative proxy. By optimizing counterfactuals on the latent manifold, the approach yields on-manifold, semantically meaningful perturbations that reveal decision boundary behavior, biases, and training progression. The framework enables automated bias detection using a confounding metric and provides complementary insights to calibration analyses and reliability diagrams. While powerful, it relies on the quality of the manifold proxy and invites future work on diverse data domains and generator architectures.

Abstract

This work proposes xGEMs or manifold guided exemplars, a framework to understand black-box classifier behavior by exploring the landscape of the underlying data manifold as data points cross decision boundaries. To do so, we train an unsupervised implicit generative model -- treated as a proxy to the data manifold. We summarize black-box model behavior quantitatively by perturbing data samples along the manifold. We demonstrate xGEMs' ability to detect and quantify bias in model learning and also for understanding the changes in model behavior as training progresses.

xGEMs: Generating Examplars to Explain Black-Box Models

TL;DR

xGEMs introduce manifold-guided exemplars to explain black-box classifiers by traversing a data manifold via an implicit generative proxy. By optimizing counterfactuals on the latent manifold, the approach yields on-manifold, semantically meaningful perturbations that reveal decision boundary behavior, biases, and training progression. The framework enables automated bias detection using a confounding metric and provides complementary insights to calibration analyses and reliability diagrams. While powerful, it relies on the quality of the manifold proxy and invites future work on diverse data domains and generator architectures.

Abstract

This work proposes xGEMs or manifold guided exemplars, a framework to understand black-box classifier behavior by exploring the landscape of the underlying data manifold as data points cross decision boundaries. To do so, we train an unsupervised implicit generative model -- treated as a proxy to the data manifold. We summarize black-box model behavior quantitatively by perturbing data samples along the manifold. We demonstrate xGEMs' ability to detect and quantify bias in model learning and also for understanding the changes in model behavior as training progresses.

Paper Structure

This paper contains 11 sections, 4 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1:
  • Figure 2: Example of bias detection. Target black-boxes:$f^1_{\phi}$ and $f^2_{\phi}$. $g^*$ classifies points w.r.t. $a$. $\tilde{{\mathbf{x}}}_1$ and $\tilde{{\mathbf{x}}}_2$ are xGEMs corresponding to ${\mathbf{x}}^*$ for $f^1_{\phi}$ and $f^2_{\phi}$ resp. $\tilde{{\mathbf{x}}}_2$'s attribute prediction (w.r.t $g^*$) is the same as that of ${\mathbf{x}}^*$ while that of $\tilde{{\mathbf{x}}}_2$ is different. Thus we say that $f^1_{\phi}$ is biased w.r.t. attribute $a$ for sample ${\mathbf{x}}^*$.
  • Figure 3: We test whether ResNet models $f^1_{\phi}$ and $f^2_{\phi}$, both trained to detect hair color but on different data distributions are confounded with gender. Two samples for classifiers $f^1_{\phi}$ (first sub row) and $f^2_{\phi}$ (second sub row) are shown. The leftmost image is the original figure, followed by its reconstruction from the encoder $F_{\psi}$. Reconstructions are plotted as Algorithm \ref{['alg:counterfactual']} (with $\lambda=0.01$) progresses toward crossing the decision boundary. The red bar indicates change in hair color label indicated at the top of each image along with the confidence of prediction. The label at the bottom indicates gender as predicted by $\hat{g}$. For both samples, classifier $f^1_{\phi}$, trained on biased data changes the gender ($1^{st}$ and $3^{rd}$ rows) while crossing the decision boundary whereas the other black-box does not.
  • Figure 4: Confidence manifolds for a few data samples for black-box models 1 and 2.
  • Figure 5: (a) and (b): 2d-Histograms of the parameters of the logistic function fits to the confidence manifolds for a $\sim4000$ samples.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 1: ${\mathbf{x}}^*, y^*$-xGEM