Table of Contents
Fetching ...

Citation Amnesia: On The Recency Bias of NLP and Other Academic Fields

Jan Philip Wahle, Terry Ruas, Mohamed Abdalla, Bela Gipp, Saif M. Mohammad

TL;DR

The paper tackles the problem of recency bias in scholarly citations, focusing on NLP and its relationship to 20 other fields over 1980–2023. It combines large-scale bibliometric metrics (mean age of citation $\text{mAoC}$, volume-adjusted $\text{VACA}$) with cross-field diversity measures (CAD/CFD) to assess intra-/inter-field citation dynamics and cross-field interactions, using OpenAlex data. The findings reveal a broad 'citation age recession' beginning around 2015, with NLP and ML exhibiting the strongest shifts toward citing more recent work; these patterns persist after volume normalization and are accompanied by distinctive intra-/inter-field citation tendencies. The work highlights potential concerns about neglecting older, relevant literature and provides an openly accessible demo and data to promote broader engagement with historical scientific contributions. Overall, the study offers a rigorous, scalable framework for understanding how citation practices evolve across fields and over time, with immediate implications for responsible research and scholarly ecosystems.

Abstract

This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals a broader scientific trend: many fields have markedly declined in citing older works (e.g., psychology, computer science). We term this decline a 'citation age recession', analogous to how economists define periods of reduced economic activity. The trend is strongest in NLP and ML research (-12.8% and -5.5% in citation age from previous peaks). Our results suggest that citing more recent works is not directly driven by the growth in publication rates (-3.4% across fields; -5.2% in humanities; -5.5% in formal sciences) -- even when controlling for an increase in the volume of papers. Our findings raise questions about the scientific community's engagement with past literature, particularly for NLP, and the potential consequences of neglecting older but relevant research. The data and a demo showcasing our results are publicly available.

Citation Amnesia: On The Recency Bias of NLP and Other Academic Fields

TL;DR

The paper tackles the problem of recency bias in scholarly citations, focusing on NLP and its relationship to 20 other fields over 1980–2023. It combines large-scale bibliometric metrics (mean age of citation , volume-adjusted ) with cross-field diversity measures (CAD/CFD) to assess intra-/inter-field citation dynamics and cross-field interactions, using OpenAlex data. The findings reveal a broad 'citation age recession' beginning around 2015, with NLP and ML exhibiting the strongest shifts toward citing more recent work; these patterns persist after volume normalization and are accompanied by distinctive intra-/inter-field citation tendencies. The work highlights potential concerns about neglecting older, relevant literature and provides an openly accessible demo and data to promote broader engagement with historical scientific contributions. Overall, the study offers a rigorous, scalable framework for understanding how citation practices evolve across fields and over time, with immediate implications for responsible research and scholarly ecosystems.

Abstract

This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals a broader scientific trend: many fields have markedly declined in citing older works (e.g., psychology, computer science). We term this decline a 'citation age recession', analogous to how economists define periods of reduced economic activity. The trend is strongest in NLP and ML research (-12.8% and -5.5% in citation age from previous peaks). Our results suggest that citing more recent works is not directly driven by the growth in publication rates (-3.4% across fields; -5.2% in humanities; -5.5% in formal sciences) -- even when controlling for an increase in the volume of papers. Our findings raise questions about the scientific community's engagement with past literature, particularly for NLP, and the potential consequences of neglecting older but relevant research. The data and a demo showcasing our results are publicly available.
Paper Structure (11 sections, 4 equations, 7 figures, 6 tables)

This paper contains 11 sections, 4 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Scientific works cite others across fields and time. A focal work may cite works from its own or other fields and in varying degrees from the past.
  • Figure 2: The percentage of citations older than ten years for (a) NLP and the avg. of all 20 fields; (b) four field groups (top to bottom in 2023: humanities, social, natural, and formal sciences); (c) NLP, ML, and the top four cited fields by NLP (top to bottom in 2023: psychology, sociology, linguistics, math, ML, NLP).
  • Figure 3: The $mAoC$ of NLP, ML, and the four most cited fields by NLP. Darker colors represent intra-field citations; lighter colors represent inter-field citations.
  • Figure 4: The percentage of $mAoC$ split for citations from NLP papers to CS papers and to non-CS papers.
  • Figure 5: The macro-average $mAoC$ from each of the other fields to a target field (black). The $mAoC$ from NLP to a target field of study (red).
  • ...and 2 more figures