Table of Contents
Fetching ...

Inverse Classification for Comparison-based Interpretability in Machine Learning

Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, Marcin Detyniecki

TL;DR

This paper tackles post-hoc interpretability when neither the classifier nor the data is accessible, by introducing a model- and data-agnostic, instance-based explainer. It formalizes the goal as finding the closest counterexample $e$ to an observation $x$ such that $f(e)\neq f(x)$, optimizing a cost $c(x,e)=||x-e||_{2}+\gamma||x-e||_{0}$ to balance proximity and sparsity. The Growing Spheres algorithm performs data-free generation of near-boundary points in $l_{2}$-spherical layers to identify an ennemy, followed by feature selection to minimize the $l_{0}$-norm, producing a final explanation as $x-e^{*}$. Empirical results on News Popularity and MNIST demonstrate that the method yields sparse, interpretable explanations and reveals local classifier behavior, while acknowledging limitations and suggesting future work to incorporate domain constraints. The approach offers a practical, interpretable lens into black-box predictions when access to models or data is restricted, with potential impact for diagnostics and model auditing in industry settings.

Abstract

In the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an instance-based approach whose principle consists in determining the minimal changes needed to alter a prediction: given a data point whose classification must be explained, the proposed method consists in identifying a close neighbour classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.

Inverse Classification for Comparison-based Interpretability in Machine Learning

TL;DR

This paper tackles post-hoc interpretability when neither the classifier nor the data is accessible, by introducing a model- and data-agnostic, instance-based explainer. It formalizes the goal as finding the closest counterexample to an observation such that , optimizing a cost to balance proximity and sparsity. The Growing Spheres algorithm performs data-free generation of near-boundary points in -spherical layers to identify an ennemy, followed by feature selection to minimize the -norm, producing a final explanation as . Empirical results on News Popularity and MNIST demonstrate that the method yields sparse, interpretable explanations and reveals local classifier behavior, while acknowledging limitations and suggesting future work to incorporate domain constraints. The approach offers a practical, interpretable lens into black-box predictions when access to models or data is restricted, with potential impact for diagnostics and model auditing in industry settings.

Abstract

In the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an instance-based approach whose principle consists in determining the minimal changes needed to alter a prediction: given a data point whose classification must be explained, the proposed method consists in identifying a close neighbour classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.

Paper Structure

This paper contains 17 sections, 3 equations, 3 figures, 2 tables, 2 algorithms.

Figures (3)

  • Figure 1: Illustration of Growing Spheres: The red circle represents the observation to interprete, the plus signs observations generated by Growing Spheres (blue for allies, black for ennemies). The white plus is the final ennemy $e^{*}$ used to generate explanations.
  • Figure 2: Sparsity distribution over the news test dataset. Reading: '30% of the observations of our test dataset have explanations that use 5 features or less'.
  • Figure 3: Output example from the application of Growing Spheres for two instances. Example of original instance $x$ (left column), its closest ennemy found $e^{*}$(center) and the explanation vector $x-e^{*}$ (right). A white pixel indicates a 0 value, black a 1