We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields
Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad
TL;DR
The paper tackles the problem of measuring how NLP influences and is influenced by other fields. It builds a large metadata dataset (~77k NLP papers with ~3.1M NLP-to-other citations and ~1.8M reverse citations across 23 fields) and introduces the Citation Field Diversity Index (CFDI), defined as $CFDI = 1 - \sum_f p_f^2$, where $p_f = x_f / X$ and $x_f$ is the number of citations to field $f$ out of $X$ total citations, to quantify cross-field citation diversity. The study finds that NLP is overwhelmingly CS-dominated in citational flows and that both incoming and outgoing field diversity have declined from 1980 to 2022, indicating increasing insularity. It contributes a publicly available dataset, multi-field analyses (including CS subfields via CSO), and an online demo tool to assess field diversity, underscoring the need for more interdisciplinary engagement in NLP to support responsible development and broader impact.
Abstract
Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.
