Table of Contents
Fetching ...

Does It Make Sense to Explain a Black Box With Another Black Box?

Julien Delaunay, Luis Galárraga, Christine Largouët

TL;DR

Does It Make Sense to Explain a Black Box With Another Black Box? investigates whether counterfactual explanations for NLP should be generated in a transparent, word-level space or via opaque latent spaces. Through a comparative study across spam detection, sentiment analysis, and fake-news detection, the authors show that transparent methods often achieve comparable or better counterfactual quality with faster runtimes, challenging the assumption that latent-space explanations are necessary. They introduce two mid-ground methods, Growing Net and Growing Language, and position existing methods along an interpretability spectrum. The work argues for prioritizing transparency when feasible and highlights the need for domain knowledge to ensure plausible, actionable explanations.

Abstract

Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.

Does It Make Sense to Explain a Black Box With Another Black Box?

TL;DR

Does It Make Sense to Explain a Black Box With Another Black Box? investigates whether counterfactual explanations for NLP should be generated in a transparent, word-level space or via opaque latent spaces. Through a comparative study across spam detection, sentiment analysis, and fake-news detection, the authors show that transparent methods often achieve comparable or better counterfactual quality with faster runtimes, challenging the assumption that latent-space explanations are necessary. They introduce two mid-ground methods, Growing Net and Growing Language, and position existing methods along an interpretability spectrum. The work argues for prioritizing transparency when feasible and highlights the need for domain knowledge to ensure plausible, actionable explanations.

Abstract

Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.
Paper Structure (26 sections, 1 equation, 8 figures, 3 tables, 3 algorithms)

This paper contains 26 sections, 1 equation, 8 figures, 3 tables, 3 algorithms.

Figures (8)

  • Figure 1: The mechanism employed to perturb the target documents by the transparent and opaque methods. Transparent techniques, on the left, convert the input text to a vector representation, where '1' indicates the presence of the input word and '0' denotes a replacement. Opaque methods, as on the right, embed words from the target text into a latent space and perturb the text in this high-dimensional space.
  • Figure 2: The tree structure of the algorithm used to iteratively perturb the target document. At each round, a word from the target text is iteratively replaced by a word from its corresponding set of potential replacement words. Thus, with each successive round, the number of word replacements for generating artificial documents increases.
  • Figure 3: Diagram representing the mechanisms of the Growing Net approach. By leveraging the tree-like structure of WordNet, Growing Net generates sets of words that can replace each term in the target document.
  • Figure 4: Diagram illustrating the operation of the Growing Language method. The words in the target text are transformed into a latent representation using a large language model. In this latent space, words with similarities become potential replacements for generating artificial documents.
  • Figure 5: Spectrum for counterfactual explanation techniques that goes from the most transparent methods on the left to the most opaque on the right. Transparent methods perturb documents in a binary space; opaque methods do it in a latent space.
  • ...and 3 more figures