Table of Contents
Fetching ...

Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies

Jonathan Kamp, Lisa Beinborn, Antske Fokkens

TL;DR

This work scrutinizes sufficiency as a faithfulness proxy for token-level rationales by linking it to two learning paradigms: token-classification of rationale tokens and attention-regularisation using rationale masks. It introduces contextual impact $CI$ as the interpretive lens for sufficiency, and systematically analyzes six rationalised datasets across multiple transformer architectures. Key findings show that high sufficiency does not reliably indicate easily identifiable rationales or consistent performance gains; instead, $CI$ often reflects the interaction between rationale and non-rationale context, with attention regularisation improving cross-domain performance for BERT but yielding mixed results otherwise. The results underscore the complexity of rationales and suggest that sufficiency alone is insufficient to guide rationale-driven learning, though simple regularisation strategies can help bridge in- and cross-domain gaps in some settings.

Abstract

Human explanations of natural language, rationales, form a tool to assess whether models learn a label for the right reasons or rely on dataset-specific shortcuts. Sufficiency is a common metric for estimating the informativeness of rationales, but it provides limited insight into the effects of rationale information on model performance. We address this limitation by relating sufficiency to two modelling paradigms: the ability of models to identify which tokens are part of the rationale (through token classification) and the ability of improving model performance by incorporating rationales in the input (through attention regularisation). We find that highly informative rationales are not likely to help classify the instance correctly. Sufficiency conversely captures the classification impact of the non-rationalised context, which interferes with rationale information in the same input. We also find that incorporating rationale information in model inputs can boost cross-domain classification, but results are inconsistent per task and model type. Finally, sufficiency and token classification appear to be unrelated. These results exemplify the complexity of rationales, showing that metrics capable of systematically capturing this type of information merit further investigation.

Learning from Sufficient Rationales: Analysing the Relationship Between Explanation Faithfulness and Token-level Regularisation Strategies

TL;DR

This work scrutinizes sufficiency as a faithfulness proxy for token-level rationales by linking it to two learning paradigms: token-classification of rationale tokens and attention-regularisation using rationale masks. It introduces contextual impact as the interpretive lens for sufficiency, and systematically analyzes six rationalised datasets across multiple transformer architectures. Key findings show that high sufficiency does not reliably indicate easily identifiable rationales or consistent performance gains; instead, often reflects the interaction between rationale and non-rationale context, with attention regularisation improving cross-domain performance for BERT but yielding mixed results otherwise. The results underscore the complexity of rationales and suggest that sufficiency alone is insufficient to guide rationale-driven learning, though simple regularisation strategies can help bridge in- and cross-domain gaps in some settings.

Abstract

Human explanations of natural language, rationales, form a tool to assess whether models learn a label for the right reasons or rely on dataset-specific shortcuts. Sufficiency is a common metric for estimating the informativeness of rationales, but it provides limited insight into the effects of rationale information on model performance. We address this limitation by relating sufficiency to two modelling paradigms: the ability of models to identify which tokens are part of the rationale (through token classification) and the ability of improving model performance by incorporating rationales in the input (through attention regularisation). We find that highly informative rationales are not likely to help classify the instance correctly. Sufficiency conversely captures the classification impact of the non-rationalised context, which interferes with rationale information in the same input. We also find that incorporating rationale information in model inputs can boost cross-domain classification, but results are inconsistent per task and model type. Finally, sufficiency and token classification appear to be unrelated. These results exemplify the complexity of rationales, showing that metrics capable of systematically capturing this type of information merit further investigation.

Paper Structure

This paper contains 44 sections, 5 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: We compare the model $\mathcal{M}$ probability of the gold label $+$ when using the full input $x$ to the probability on the isolated rationales $r$. The example is an instance from the sentiment analysis dataset SST socher-etal-2013-recursivecarton-etal-2020-evaluating.
  • Figure 2: Dataset-average contextual impact ($CI$).
  • Figure 3: Performance results expressed in $TC$ and $AR$. Scores $> 1.00$ indicate model improvement over the baselines described in §\ref{['sec:performance_metrics']}. For example, $1.15$ indicates a relative improvement by a factor of $1.15$, or +15%.
  • Figure 4: The correlation between $CI$ and predictions in high $AR$ (↑) and low $AR$ (↓) tasks tends to positive, not negative, suggesting that high $CI$ does not entail rationale informativeness. Significance (*) at $p$ < .05.
  • Figure 5: % Correct predictions for bottom and top Contextual Impact. For each model, we compare the respective tasks that achieved highest $AR$ (↑)--where a positive rationalisation effect was observed, vs. lowest $AR$ (↓). Overall, high $CI$ leans to correct predictions. A greater top--bottom distance for $\mathcal{R}$ is observed in high $AR$ (↑) tasks.
  • ...and 4 more figures