SyntaxShap: Syntax-aware Explainability Method for Text Generation

Kenza Amara; Rita Sevastjanova; Mennatallah El-Assady

SyntaxShap: Syntax-aware Explainability Method for Text Generation

Kenza Amara, Rita Sevastjanova, Mennatallah El-Assady

TL;DR

SyntaxShap tackles the explainability gap in autoregressive text generation by introducing a syntax-aware, SHAP-based method that constrains token coalitions to follow dependency-tree structure. By incorporating dependency relations and optional level-based weighting, SyntaxShap and its variant SyntaxShap-W yield more faithful and coherent explanations than baseline perturbation-and-surrogate methods, as evaluated on GPT-2 and Mistral 7B across Generics, ROCStories, and Negation datasets. The study also shows a divergence between model-faithful explanations and human semantic expectations, underscoring the need for careful, multi-faceted evaluation of explanations. Overall, the work advances text-generation explainability by fusing linguistic structure with coalition-based attribution, with implications for safety-critical applications and future integration of linguistic knowledge.

Abstract

To harness the power of large language models in safety-critical domains, we need to ensure the explainability of their predictions. However, despite the significant attention to model interpretability, there remains an unexplored domain in explaining sequence-to-sequence tasks using methods tailored for textual data. This paper introduces SyntaxShap, a local, model-agnostic explainability method for text generation that takes into consideration the syntax in the text data. The presented work extends Shapley values to account for parsing-based syntactic dependencies. Taking a game theoric approach, SyntaxShap only considers coalitions constraint by the dependency tree. We adopt a model-based evaluation to compare SyntaxShap and its weighted form to state-of-the-art explainability methods adapted to text generation tasks, using diverse metrics including faithfulness, coherency, and semantic alignment of the explanations to the model. We show that our syntax-aware method produces explanations that help build more faithful and coherent explanations for predictions by autoregressive models. Confronted with the misalignment of human and AI model reasoning, this paper also highlights the need for cautious evaluation strategies in explainable AI.

SyntaxShap: Syntax-aware Explainability Method for Text Generation

TL;DR

Abstract

Paper Structure (47 sections, 15 equations, 12 figures, 4 tables)

This paper contains 47 sections, 15 equations, 12 figures, 4 tables.

Introduction
Related Work
Explainability in Linguistics
SHAP-based explainability in NLP
Shapley values and complex dependencies
SyntaxShap Methodology
Objective
Shapley values approach
Syntax-aware coalition game
Weighted SyntaxShap
Evaluation
Quantitative evaluation
Fidelity
Probability divergence@K
Accuracy@K
...and 32 more sections

Figures (12)

Figure 1: Given an input sentence, an autoregressive language model (AR LM) predicts the next token. The syntax of the sentence is extracted using dependency parsing (spaCyspacy). To measure the importance of the word husband for the model to predict the next token wife, our method (1) extracts multiple coalitions of words following specific paths in the dependency tree, (2) analyze the contribution of adding husband to each coalition in the change of probability to predict the next token wife, and (3) average those contributions to compute its final SyntaxShap value.
Figure 2: Faithfulness of the explanations of Mistral 7B predictions by the methods Random, LIME, FeatureAblation, SampleShapley, Partition, and our methods SyntaxShap and SyntaxShap-W.
Figure 3: Faithfulness of the explanations of GPT-2 predictions by the methods Random, LIME, FeatureAblation, SampleShapley, Partition, and our methods SyntaxShap and SyntaxShap-W.
Figure 4: An example of attribution values of the SyntaxShap-W method for two sentence pairs with different and similar next token predictions.
Figure 5: Coherency of explainability methods for the Mistral 7B model on sentence pairs varying by the used negation. SyntaxShap and SyntaxShap-W produce more similar attribution scores for sentence pairs where the model predicts the same next token compared to sentence pairs with different next token predictions.
...and 7 more figures

SyntaxShap: Syntax-aware Explainability Method for Text Generation

TL;DR

Abstract

SyntaxShap: Syntax-aware Explainability Method for Text Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)