Table of Contents
Fetching ...

Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks

Xiaolei Lu, Jianghong Ma

TL;DR

This work examines whether faithfulness (faithfully reflecting a model's reasoning) conflicts with plausibility (explanations aligning with human understanding) in NLP explanations. It conducts an empirical comparison across SST-2, SNIPS, and 20Newsgroups, evaluating attribution methods (SV, LIME, IG, RawAtt, AttRll) against GPT-4-generated explanations as a plausibility benchmark. Faithfulness is measured with $\mathrm{LOR}(k)$, $\mathrm{SF}(k)$, and $\mathrm{CM}(k)$, while plausibility uses Rank Correlation (RC) and Overlap Rate (OR) relative to GPT-4 explanations. The findings suggest that perturbation-based methods (SV, LIME) can achieve high fidelity and plausibility, and that GPT-4 explanations align well with expert reasoning, supporting the idea of optimizing explainability for a dual objective of accuracy and user interpretability.

Abstract

Explainability algorithms aimed at interpreting decision-making AI systems usually consider balancing two critical dimensions: 1) \textit{faithfulness}, where explanations accurately reflect the model's inference process. 2) \textit{plausibility}, where explanations are consistent with domain experts. However, the question arises: do faithfulness and plausibility inherently conflict? In this study, through a comprehensive quantitative comparison between the explanations from the selected explainability methods and expert-level interpretations across three NLP tasks: sentiment analysis, intent detection, and topic labeling, we demonstrate that traditional perturbation-based methods Shapley value and LIME could attain greater faithfulness and plausibility. Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives to achieve high levels of accuracy and user accessibility in their explanations.

Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks

TL;DR

This work examines whether faithfulness (faithfully reflecting a model's reasoning) conflicts with plausibility (explanations aligning with human understanding) in NLP explanations. It conducts an empirical comparison across SST-2, SNIPS, and 20Newsgroups, evaluating attribution methods (SV, LIME, IG, RawAtt, AttRll) against GPT-4-generated explanations as a plausibility benchmark. Faithfulness is measured with , , and , while plausibility uses Rank Correlation (RC) and Overlap Rate (OR) relative to GPT-4 explanations. The findings suggest that perturbation-based methods (SV, LIME) can achieve high fidelity and plausibility, and that GPT-4 explanations align well with expert reasoning, supporting the idea of optimizing explainability for a dual objective of accuracy and user interpretability.

Abstract

Explainability algorithms aimed at interpreting decision-making AI systems usually consider balancing two critical dimensions: 1) \textit{faithfulness}, where explanations accurately reflect the model's inference process. 2) \textit{plausibility}, where explanations are consistent with domain experts. However, the question arises: do faithfulness and plausibility inherently conflict? In this study, through a comprehensive quantitative comparison between the explanations from the selected explainability methods and expert-level interpretations across three NLP tasks: sentiment analysis, intent detection, and topic labeling, we demonstrate that traditional perturbation-based methods Shapley value and LIME could attain greater faithfulness and plausibility. Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives to achieve high levels of accuracy and user accessibility in their explanations.
Paper Structure (15 sections, 5 equations, 7 figures, 6 tables)

This paper contains 15 sections, 5 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Faithfulness evaluation performances on SST-2 and SNIPS over BERT and RoBERTa architectures, where lower LOR and SF scores are better, higher CM scores are preferred.
  • Figure 2: Rank coefficient between the explainability methods and GPT-4 on SST-2 and SNIPS over BERT and RoBERTa architectures.
  • Figure 3: Overlap rate when $k=4$ over BERT and RoBERTa architectures.
  • Figure 4: Overlap rate over BERT in SST-2.
  • Figure 5: Overlap rate over RoBERTa in SST-2.
  • ...and 2 more figures