Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

Gianluigi Lopardo; Damien Garreau

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

Gianluigi Lopardo, Damien Garreau

TL;DR

This work tackles interpretability in natural language processing by comparing two popular local explanation methods: LIME, a perturbation-based feature-importance approach, and Anchors, a rule-based technique that yields simple, high-precision explanations. The authors adapt both methods to text data under TF-IDF representations and introduce a quantitative metric, the $ell_E$-index, based on Jaccard similarity to assess how well explanations recover the most influential words. Through qualitative and quantitative experiments on sentiment classification with simple models (e.g., small decision trees and sparse logistic models), they show that LIME and Anchors can produce substantially different explanations, with LIME generally better at capturing multiple informative words and Anchors often relying on single-word anchors influenced by word multiplicities. The study highlights the method-dependent nature of explanations, provides a concrete evaluation tool for cross-method comparison, and suggests practical guidance for selecting explainers in text applications.

Abstract

Complex machine learning algorithms are used more and more often in critical tasks involving text data, leading to the development of interpretability methods. Among local methods, two families have emerged: those computing importance scores for each feature and those extracting simple logical rules. In this paper we show that using different methods can lead to unexpectedly different explanations, even when applied to simple models for which we would expect qualitative coincidence. To quantify this effect, we propose a new approach to compare explanations produced by different methods.

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

TL;DR

-index, based on Jaccard similarity to assess how well explanations recover the most influential words. Through qualitative and quantitative experiments on sentiment classification with simple models (e.g., small decision trees and sparse logistic models), they show that LIME and Anchors can produce substantially different explanations, with LIME generally better at capturing multiple informative words and Anchors often relying on single-word anchors influenced by word multiplicities. The study highlights the method-dependent nature of explanations, provides a concrete evaluation tool for cross-method comparison, and suggests practical guidance for selecting explainers in text applications.

Abstract

Paper Structure (21 sections, 5 equations, 4 figures, 1 table)

This paper contains 21 sections, 5 equations, 4 figures, 1 table.

Introduction
Notation.
Methods
LIME for text data
Sampling.
Surrogate model.
Anchors for text data
Sampling.
Main results
Qualitative evaluation
Simple decision trees.
Presence of a given word.
Small decision tree.
Presence of several words.
Presence of disjoint subsets of words.
...and 6 more sections

Figures (4)

Figure 1: Comparison on the classifiers $\mathbbm{1}_{{\text{good}}\in z}$ (left panel) and $\mathbbm{1}_{({\text{not}}\in z \;\text{and}\; {\text{bad}}\in z) \;\text{or}\; {\text{good}}\in z}$ (right panel) applied to the same review. Anchors makes no difference between the two.
Figure 2: Making a word disappear from the explanation by adding one occurrence. The classifier $\mathbbm{1}_{({\text{very}}\in z \;\text{and}\; {\text{good}}\in z)}$ is applied when $m_{\text{very}}=4$ (left) and $m_{\text{very}}=5$ (right).
Figure 3: Comparison on the classifier $\mathbbm{1}_{({\text{not}}\in z \;\text{and}\; {\text{bad}}\in z) \;\text{or}\; ({\text{very}}\in z \;\text{and}\; {\text{good}}\in z)}$ when only $m_{\text{very}}$ is changing. Anchors' explanations depend on multiplicities.
Figure 4: Comparison on logistic model with $\lambda_{\text{love}}=-1$, $\lambda_{\text{good}}=+5$ and $\lambda_w=0$ for the others (left), vs$\lambda_{\text{good}}=10$ and $\lambda_w\sim\mathcal{N}(0,1)$ for the others (right), applied to the same document. good is the most important word for the classification in both cases.

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

TL;DR

Abstract

Comparing Feature Importance and Rule Extraction for Interpretability on Text Data

Authors

TL;DR

Abstract

Table of Contents

Figures (4)