On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon, Roi Reichart
TL;DR
This paper reframes NLP interpretability through the lens of diverse stakeholders, arguing that existing surveys overlook user needs and real-world contexts. It defines interpretability methods and explanations in a broad, stakeholder-centered framework, and provides a six-property taxonomy to categorize methods by what is explained and how it is explained. Through a large-scale, LLM-assisted analysis of thousands of papers across NLP and other fields, the authors reveal stable NLP trends but significant cross-domain differences, including a rise of natural language explanations in the LLM era and varying preferences for input-output versus internal-mechanism explanations. The work offers practical recommendations to broaden adoption across disciplines, emphasizes faithfulness and causality, and highlights opportunities for concept-level, self-explaining, and causal interpretability in future NLP research.
Abstract
Recent advancements in NLP systems, particularly with the introduction of LLMs, have led to widespread adoption of these systems by a broad spectrum of users across various domains, impacting decision-making, the job market, society, and scientific research. This surge in usage has led to an explosion in NLP model interpretability and analysis research, accompanied by numerous technical surveys. Yet, these surveys often overlook the needs and perspectives of explanation stakeholders. In this paper, we address three fundamental questions: Why do we need interpretability, what are we interpreting, and how? By exploring these questions, we examine existing interpretability paradigms, their properties, and their relevance to different stakeholders. We further explore the practical implications of these paradigms by analyzing trends from the past decade across multiple research fields. To this end, we retrieved thousands of papers and employed an LLM to characterize them. Our analysis reveals significant disparities between NLP developers and non-developer users, as well as between research fields, underscoring the diverse needs of stakeholders. For example, explanations of internal model components are rarely used outside the NLP field. We hope this paper informs the future design, development, and application of methods that align with the objectives and requirements of various stakeholders.
