To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

Sireesh Gururaja; Amanda Bertsch; Clara Na; David Gray Widder; Emma Strubell

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

Sireesh Gururaja, Amanda Bertsch, Clara Na, David Gray Widder, Emma Strubell

TL;DR

This work conducts long-form interviews with 26 NLP researchers of varying seniority, research area, institution, and social identity to study factors that shape NLP as a field, including culture, incentives, and infrastructure.

Abstract

NLP is in a period of disruptive change that is impacting our methodologies, funding sources, and public perception. In this work, we seek to understand how to shape our future by better understanding our past. We study factors that shape NLP as a field, including culture, incentives, and infrastructure by conducting long-form interviews with 26 NLP researchers of varying seniority, research area, institution, and social identity. Our interviewees identify cyclical patterns in the field, as well as new shifts without historical parallel, including changes in benchmark culture and software infrastructure. We complement this discussion with quantitative analysis of citation, authorship, and language use in the ACL Anthology over time. We conclude by discussing shared visions, concerns, and hopes for the future of NLP. We hope that this study of our field's past and present can prompt informed discussion of our community's implicit norms and more deliberate action to consciously shape the future.

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

TL;DR

Abstract

Paper Structure (38 sections, 4 figures, 1 table)

This paper contains 38 sections, 4 figures, 1 table.

Introduction
Methods
Qualitative methods
Quantitative Methods
Exploit-explore cycles of work
First wave: exploit.
Second wave: explore
Where are we now?
Prompting as a methodological shift
"Era of scale"
"Deep learning monoculture"
Issues with peer review
Benchmarking culture
The rise of benchmarks
The current state of benchmarks
...and 23 more sections

Figures (4)

Figure 1: The number of unique researchers publishing in ACL venues has increased dramatically, from 715 unique authors in 1980 to 17,829 in 2022.
Figure 2: Quantitative and qualitative timeline. The lower half of this diagram captures historical information that our participants felt was relevant, along with their reported date ranges. The upper half captures quantitative information that parallels that timeline. Bar charts indicate fraction of papers that cite a given paper, while line charts indicate the fraction of papers that use a particular term.
Figure 3: Mentions of libraries over time in the ACL Anthology. Note the cyclic pattern and increasing concentration on the dominant framework over time. While some libraries are built on others, the shift in mentions over time captures the primary level of abstraction that researchers consider important. See appendix \ref{['sec:quant-methods']} for details on how we handle ambiguity in mentions.
Figure 4: The number of "active" researchers publishing in ACL venues has increased dramatically, with more newcomers to the field year over year

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

TL;DR

Abstract

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)