Table of Contents
Fetching ...

Model Agnostic Contrastive Explanations for Structured Data

Amit Dhurandhar, Tejaswini Pedapati, Avinash Balakrishnan, Pin-Yu Chen, Karthikeyan Shanmugam, Ruchir Puri

TL;DR

This paper introduces MACEM, a model-agnostic method to generate contrastive explanations for structured data by querying only class probabilities. It defines Pertinent Positives and Pertinent Negatives as sparsest and closest perturbations relative to base values, solved via a projected FISTA in a black-box setting with zeroth-order gradient estimation. The approach handles real and categorical features through two strategies (FMA and SSA) and demonstrates superior, faithful explanations compared to LIME across five datasets, including qualitative expert assessments. MACEM's emphasis on contrastive, trustworthy explanations with minimal input changes offers practical benefits for regulatory and domain-specific explainability needs. The work also outlines directions to extend to unstructured data and more complex modalities.

Abstract

Recently, a method [7] was proposed to generate contrastive explanations for differentiable models such as deep neural networks, where one has complete access to the model. In this work, we propose a method, Model Agnostic Contrastive Explanations Method (MACEM), to generate contrastive explanations for \emph{any} classification model where one is able to \emph{only} query the class probabilities for a desired input. This allows us to generate contrastive explanations for not only neural networks, but models such as random forests, boosted trees and even arbitrary ensembles that are still amongst the state-of-the-art when learning on structured data [13]. Moreover, to obtain meaningful explanations we propose a principled approach to handle real and categorical features leading to novel formulations for computing pertinent positives and negatives that form the essence of a contrastive explanation. A detailed treatment of the different data types of this nature was not performed in the previous work, which assumed all features to be positive real valued with zero being indicative of the least interesting value. We part with this strong implicit assumption and generalize these methods so as to be applicable across a much wider range of problem settings. We quantitatively and qualitatively validate our approach over 5 public datasets covering diverse domains.

Model Agnostic Contrastive Explanations for Structured Data

TL;DR

This paper introduces MACEM, a model-agnostic method to generate contrastive explanations for structured data by querying only class probabilities. It defines Pertinent Positives and Pertinent Negatives as sparsest and closest perturbations relative to base values, solved via a projected FISTA in a black-box setting with zeroth-order gradient estimation. The approach handles real and categorical features through two strategies (FMA and SSA) and demonstrates superior, faithful explanations compared to LIME across five datasets, including qualitative expert assessments. MACEM's emphasis on contrastive, trustworthy explanations with minimal input changes offers practical benefits for regulatory and domain-specific explainability needs. The work also outlines directions to extend to unstructured data and more complex modalities.

Abstract

Recently, a method [7] was proposed to generate contrastive explanations for differentiable models such as deep neural networks, where one has complete access to the model. In this work, we propose a method, Model Agnostic Contrastive Explanations Method (MACEM), to generate contrastive explanations for \emph{any} classification model where one is able to \emph{only} query the class probabilities for a desired input. This allows us to generate contrastive explanations for not only neural networks, but models such as random forests, boosted trees and even arbitrary ensembles that are still amongst the state-of-the-art when learning on structured data [13]. Moreover, to obtain meaningful explanations we propose a principled approach to handle real and categorical features leading to novel formulations for computing pertinent positives and negatives that form the essence of a contrastive explanation. A detailed treatment of the different data types of this nature was not performed in the previous work, which assumed all features to be positive real valued with zero being indicative of the least interesting value. We part with this strong implicit assumption and generalize these methods so as to be applicable across a much wider range of problem settings. We quantitatively and qualitatively validate our approach over 5 public datasets covering diverse domains.

Paper Structure

This paper contains 15 sections, 8 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Above we see an example explanation for a loan application from the German Credit dataset that was rejected by a black box model which was a tree. We depict the important features for PPs and PNs. Our PPs convey that even if the person didn't need a co-applicant and had lower credit card debt the application would still be rejected. In contrast, our PNs inform us that if the persons checking amount had more money, the loan installment rate was lower and there were no people that he/she was responsible for then the loan would have been accepted.
  • Figure 2: Above we see a categorical feature taking three values A, B and C with frequencies 11, 6 and 1 respectively as indicated on the vertical axis. Our mapping function in equation 11 for FMA maps these frequencies and hence the categorical values to 0, 0.5 and 1 in the $[0,1]$ interval. The red horizontal lines depict the function $h(.)$ showcasing the range of values that map back to either A, B or C.
  • Figure 3: Above we compare the actual tree path (blue arrows) and the corresponding PP/PN important features for an input in the a) German Credit dataset and b) Olfaction dataset. The PP columns (center) and the PN columns (right) list the top 3 features highlighted by MACEM for the corresponding PPs and PNs. The PP feature importance reduces top to bottom, while the PN feature importance reduces bottom to top. The red arrows indicate PP and PN features that match the features in the tree path for the respective inputs.