On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Nitay Calderon; Roi Reichart

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Nitay Calderon, Roi Reichart

TL;DR

This paper reframes NLP interpretability through the lens of diverse stakeholders, arguing that existing surveys overlook user needs and real-world contexts. It defines interpretability methods and explanations in a broad, stakeholder-centered framework, and provides a six-property taxonomy to categorize methods by what is explained and how it is explained. Through a large-scale, LLM-assisted analysis of thousands of papers across NLP and other fields, the authors reveal stable NLP trends but significant cross-domain differences, including a rise of natural language explanations in the LLM era and varying preferences for input-output versus internal-mechanism explanations. The work offers practical recommendations to broaden adoption across disciplines, emphasizes faithfulness and causality, and highlights opportunities for concept-level, self-explaining, and causal interpretability in future NLP research.

Abstract

Recent advancements in NLP systems, particularly with the introduction of LLMs, have led to widespread adoption of these systems by a broad spectrum of users across various domains, impacting decision-making, the job market, society, and scientific research. This surge in usage has led to an explosion in NLP model interpretability and analysis research, accompanied by numerous technical surveys. Yet, these surveys often overlook the needs and perspectives of explanation stakeholders. In this paper, we address three fundamental questions: Why do we need interpretability, what are we interpreting, and how? By exploring these questions, we examine existing interpretability paradigms, their properties, and their relevance to different stakeholders. We further explore the practical implications of these paradigms by analyzing trends from the past decade across multiple research fields. To this end, we retrieved thousands of papers and employed an LLM to characterize them. Our analysis reveals significant disparities between NLP developers and non-developer users, as well as between research fields, underscoring the diverse needs of stakeholders. For example, explanations of internal model components are rarely used outside the NLP field. We hope this paper informs the future design, development, and application of methods that align with the objectives and requirements of various stakeholders.

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

TL;DR

Abstract

Paper Structure (34 sections, 6 figures, 5 tables)

This paper contains 34 sections, 6 figures, 5 tables.

Introduction
Why Do We Need Interpretability?
Definitions
What is an Interpretability Method?
What is an Explanation?
Properties and Categorization
What Properties
The Explained Mechanism
Raw Input or Abstracted Input
Scope: Local or Global
How Properties
Time: Post-hoc or Intrinsic
Access: Model Specific or Agnostic
Presenting Insights
Faithfulness and Causality
...and 19 more sections

Figures (6)

Figure 1: Number of NLP Interpretability papers published over time. Each year spans from June of the previous year to the following June. The left plot shows the distribution of papers across NLP and the other fields (Outside). The right plot shows trends in other fields besides NLP. Only papers that use, propose, or discuss interpretability methods applied to natural language are counted, following relevance filtering by an LLM.
Figure 2: Overview of four perspectives on the need for interpretability proposed in this paper.
Figure 3: An illustration of our five-stage procedure for annotating NLP interpretability papers, with the stages fully detailed in Appendix §\ref{['sec:additional']}.
Figure 4: Distribution of NLP interpretability paradigms by research field, including papers in years 2015-24.
Figure 5: Trends in NLP interpretability paradigms over time in the NLP field (left plot) and in fields outside of NLP (right plot). The plots show the percentages of papers for each paradigm, as predicted by an LLM. The data smoothed using a one-year moving average.
...and 1 more figures

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

TL;DR

Abstract

On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (6)