Table of Contents
Fetching ...

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

Gianluigi Lopardo, Damien Garreau

TL;DR

This work tackles interpretability in natural language processing by comparing two popular local explanation methods: LIME, a perturbation-based feature-importance approach, and Anchors, a rule-based technique that yields simple, high-precision explanations. The authors adapt both methods to text data under TF-IDF representations and introduce a quantitative metric, the $ell_E$-index, based on Jaccard similarity to assess how well explanations recover the most influential words. Through qualitative and quantitative experiments on sentiment classification with simple models (e.g., small decision trees and sparse logistic models), they show that LIME and Anchors can produce substantially different explanations, with LIME generally better at capturing multiple informative words and Anchors often relying on single-word anchors influenced by word multiplicities. The study highlights the method-dependent nature of explanations, provides a concrete evaluation tool for cross-method comparison, and suggests practical guidance for selecting explainers in text applications.

Abstract

Complex machine learning algorithms are used more and more often in critical tasks involving text data, leading to the development of interpretability methods. Among local methods, two families have emerged: those computing importance scores for each feature and those extracting simple logical rules. In this paper we show that using different methods can lead to unexpectedly different explanations, even when applied to simple models for which we would expect qualitative coincidence. To quantify this effect, we propose a new approach to compare explanations produced by different methods.

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

TL;DR

This work tackles interpretability in natural language processing by comparing two popular local explanation methods: LIME, a perturbation-based feature-importance approach, and Anchors, a rule-based technique that yields simple, high-precision explanations. The authors adapt both methods to text data under TF-IDF representations and introduce a quantitative metric, the -index, based on Jaccard similarity to assess how well explanations recover the most influential words. Through qualitative and quantitative experiments on sentiment classification with simple models (e.g., small decision trees and sparse logistic models), they show that LIME and Anchors can produce substantially different explanations, with LIME generally better at capturing multiple informative words and Anchors often relying on single-word anchors influenced by word multiplicities. The study highlights the method-dependent nature of explanations, provides a concrete evaluation tool for cross-method comparison, and suggests practical guidance for selecting explainers in text applications.

Abstract

Complex machine learning algorithms are used more and more often in critical tasks involving text data, leading to the development of interpretability methods. Among local methods, two families have emerged: those computing importance scores for each feature and those extracting simple logical rules. In this paper we show that using different methods can lead to unexpectedly different explanations, even when applied to simple models for which we would expect qualitative coincidence. To quantify this effect, we propose a new approach to compare explanations produced by different methods.
Paper Structure (21 sections, 5 equations, 4 figures, 1 table)

This paper contains 21 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Comparison on the classifiers $\mathbbm{1}_{{\text{good}}\in z}$ (left panel) and $\mathbbm{1}_{({\text{not}}\in z \;\text{and}\; {\text{bad}}\in z) \;\text{or}\; {\text{good}}\in z}$ (right panel) applied to the same review. Anchors makes no difference between the two.
  • Figure 2: Making a word disappear from the explanation by adding one occurrence. The classifier $\mathbbm{1}_{({\text{very}}\in z \;\text{and}\; {\text{good}}\in z)}$ is applied when $m_{\text{very}}=4$ (left) and $m_{\text{very}}=5$ (right).
  • Figure 3: Comparison on the classifier $\mathbbm{1}_{({\text{not}}\in z \;\text{and}\; {\text{bad}}\in z) \;\text{or}\; ({\text{very}}\in z \;\text{and}\; {\text{good}}\in z)}$ when only $m_{\text{very}}$ is changing. Anchors' explanations depend on multiplicities.
  • Figure 4: Comparison on logistic model with $\lambda_{\text{love}}=-1$, $\lambda_{\text{good}}=+5$ and $\lambda_w=0$ for the others (left), vs$\lambda_{\text{good}}=10$ and $\lambda_w\sim\mathcal{N}(0,1)$ for the others (right), applied to the same document. good is the most important word for the classification in both cases.