Table of Contents
Fetching ...

The Effect of Model Size on LLM Post-hoc Explainability via LIME

Henning Heyen, Amy Widdicombe, Noah Y. Siegel, Maria Perez-Ortiz, Philip Treleaven

TL;DR

The paper addresses how LLM post-hoc explainability via LIME changes with model size. It analyzes four DeBERTaV3 models across NLI and ZSC using faithfulness (comprehensiveness) and plausibility (IOU) against human highlights. Results show that while faithfulness improves with larger models, plausibility does not, indicating a misalignment between internal decision processes and LIME explanations. The findings reveal limitations of removal-based faithfulness metrics in NLP and argue for more expressive, human-aligned explainability frameworks, potentially influenced by future RLHF and alternative explainability approaches.

Abstract

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.

The Effect of Model Size on LLM Post-hoc Explainability via LIME

TL;DR

The paper addresses how LLM post-hoc explainability via LIME changes with model size. It analyzes four DeBERTaV3 models across NLI and ZSC using faithfulness (comprehensiveness) and plausibility (IOU) against human highlights. Results show that while faithfulness improves with larger models, plausibility does not, indicating a misalignment between internal decision processes and LIME explanations. The findings reveal limitations of removal-based faithfulness metrics in NLP and argue for more expressive, human-aligned explainability frameworks, potentially influenced by future RLHF and alternative explainability approaches.

Abstract

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.
Paper Structure (12 sections, 3 equations, 4 figures, 8 tables)

This paper contains 12 sections, 3 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: LIME example on a CoS-e instance using the xsmall DeBERTaV3 model. LIME maps every token to a real-valued importance score.
  • Figure 2: Mean comprehensiveness scores by labels for MNLI (left), e-SNLI (middle) and mean IOU scores by labels for e-SNLI (right) with mean standard errors on 100 test samples across all model sizes. Note how neutral sentence pairs achieve generally lower comprehensiveness scores than contradictive sentence pairs and how IOU scores are almost constant as the model size increases regardless of the label. Exact numbers are displayed in Table \ref{['tab:faithfulness_and_plausibility_by_label']} in the appendix.
  • Figure 3: Visualistion of comprehensiveness metric on CoS-e instance from deyoung. Comprehensiveness suggests that an explanation is faithful if the prediction strongly deviates when the most important tokens (as identified by the explanation method) are removed from the input sequence.
  • Figure 4: Mean comprehensiveness (left) and IOU (right) scores with mean standard errors on 100 test samples for each dataset across all model sizes. IOU could not be computed on MNLI as this dataset does not provide human-annotated highlights as ground truth explanations.