Table of Contents
Fetching ...

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

TL;DR

The paper tackles the problem of measuring how NLP influences and is influenced by other fields. It builds a large metadata dataset (~77k NLP papers with ~3.1M NLP-to-other citations and ~1.8M reverse citations across 23 fields) and introduces the Citation Field Diversity Index (CFDI), defined as $CFDI = 1 - \sum_f p_f^2$, where $p_f = x_f / X$ and $x_f$ is the number of citations to field $f$ out of $X$ total citations, to quantify cross-field citation diversity. The study finds that NLP is overwhelmingly CS-dominated in citational flows and that both incoming and outgoing field diversity have declined from 1980 to 2022, indicating increasing insularity. It contributes a publicly available dataset, multi-field analyses (including CS subfields via CSO), and an online demo tool to assess field diversity, underscoring the need for more interdisciplinary engagement in NLP to support responsible development and broader impact.

Abstract

Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.

We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields

TL;DR

The paper tackles the problem of measuring how NLP influences and is influenced by other fields. It builds a large metadata dataset (~77k NLP papers with ~3.1M NLP-to-other citations and ~1.8M reverse citations across 23 fields) and introduces the Citation Field Diversity Index (CFDI), defined as , where and is the number of citations to field out of total citations, to quantify cross-field citation diversity. The study finds that NLP is overwhelmingly CS-dominated in citational flows and that both incoming and outgoing field diversity have declined from 1980 to 2022, indicating increasing insularity. It contributes a publicly available dataset, multi-field analyses (including CS subfields via CSO), and an online demo tool to assess field diversity, underscoring the need for more interdisciplinary engagement in NLP to support responsible development and broader impact.

Abstract

Natural Language Processing (NLP) is poised to substantially influence the world. However, significant progress comes hand-in-hand with substantial risks. Addressing them requires broad engagement with various fields of study. Yet, little empirical work examines the state of such engagement (past or current). In this paper, we quantify the degree of influence between 23 fields of study and NLP (on each other). We analyzed ~77k NLP papers, ~3.1m citations from NLP papers to other papers, and ~1.8m citations from other papers to NLP papers. We show that, unlike most fields, the cross-field engagement of NLP, measured by our proposed Citation Field Diversity Index (CFDI), has declined from 0.58 in 1980 to 0.31 in 2022 (an all-time low). In addition, we find that NLP has grown more insular -- citing increasingly more NLP papers and having fewer papers that act as bridges between fields. NLP citations are dominated by computer science; Less than 8% of NLP citations are to linguistics, and less than 3% are to math and psychology. These findings underscore NLP's urgent need to reflect on its engagement with various fields.
Paper Structure (14 sections, 2 equations, 17 figures, 2 tables)

This paper contains 14 sections, 2 equations, 17 figures, 2 tables.

Figures (17)

  • Figure 1: Citations from other fields to NLP (right) and from NLP to other fields (left).
  • Figure 2: The percentage of citations (a) from NLP to CS and non-CS and (b) from CS and non-CS to NLP over all citations from and to NLP with a moving average of three years.
  • Figure 3: The percentage of citations (a) from NLP to non-CS fields and (b) non-CS fields to NLP in relation to all non-CS citations from and to NLP.
  • Figure 4: NLP's Outgoing Relative Citational Prominence (ORCP) scores for 23 fields of study.
  • Figure 5: CFDI of NLP and the three largest fields cited by NLP for (a) outgoing citations and (b) incoming citations of that field. The macro-average shows CFDI for the average over all 23 fields.
  • ...and 12 more figures