The Effect of Model Size on LLM Post-hoc Explainability via LIME
Henning Heyen, Amy Widdicombe, Noah Y. Siegel, Maria Perez-Ortiz, Philip Treleaven
TL;DR
The paper addresses how LLM post-hoc explainability via LIME changes with model size. It analyzes four DeBERTaV3 models across NLI and ZSC using faithfulness (comprehensiveness) and plausibility (IOU) against human highlights. Results show that while faithfulness improves with larger models, plausibility does not, indicating a misalignment between internal decision processes and LIME explanations. The findings reveal limitations of removal-based faithfulness metrics in NLP and argue for more expressive, human-aligned explainability frameworks, potentially influenced by future RLHF and alternative explainability approaches.
Abstract
Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.
