Table of Contents
Fetching ...

Internal and External Impacts of Natural Language Processing Papers

Yu Zhang

TL;DR

This study conducts a large-scale scientometric analysis of NLP papers from ACL, EMNLP, and NAACL published between 1979 and 2024 to compare internal academic impact with external attention in patents, media, and policy documents. By mapping papers to OpenAlex and aggregating external mentions from Reliance on Science, Altmetric, and Overton, and by labeling papers into 25 topics with GPT-4o (validated by human evaluators), the authors define an $Impact(t|d)$ metric to quantify topic-specific influence across domains. The findings show language modeling as the dominant driver of influence in both scholarly and public spheres, while linguistic foundations remain comparatively underrepresented; ethics, bias, and fairness receive substantial attention in policy but less in patents or academia. The analysis also reveals positive cross-domain correlations and clear complementarity among external domains, highlighting how patents emphasize practical NLP while media and policy emphasize societal implications, with papers spanning multiple external domains achieving the highest internal impact. These insights advance understanding of how NLP research travels from conferences to broader societal and technological ecosystems, offering a nuanced view of what ACL-type conferences can signify about AI research trajectories.

Abstract

We investigate the impacts of NLP research published in top-tier conferences (i.e., ACL, EMNLP, and NAACL) from 1979 to 2024. By analyzing citations from research articles and external sources such as patents, media, and policy documents, we examine how different NLP topics are consumed both within the academic community and by the broader public. Our findings reveal that language modeling has the widest internal and external influence, while linguistic foundations have lower impacts. We also observe that internal and external impacts generally align, but topics like ethics, bias, and fairness show significant attention in policy documents with much fewer academic citations. Additionally, external domains exhibit distinct preferences, with patents focusing on practical NLP applications and media and policy documents engaging more with the societal implications of NLP models.

Internal and External Impacts of Natural Language Processing Papers

TL;DR

This study conducts a large-scale scientometric analysis of NLP papers from ACL, EMNLP, and NAACL published between 1979 and 2024 to compare internal academic impact with external attention in patents, media, and policy documents. By mapping papers to OpenAlex and aggregating external mentions from Reliance on Science, Altmetric, and Overton, and by labeling papers into 25 topics with GPT-4o (validated by human evaluators), the authors define an metric to quantify topic-specific influence across domains. The findings show language modeling as the dominant driver of influence in both scholarly and public spheres, while linguistic foundations remain comparatively underrepresented; ethics, bias, and fairness receive substantial attention in policy but less in patents or academia. The analysis also reveals positive cross-domain correlations and clear complementarity among external domains, highlighting how patents emphasize practical NLP while media and policy emphasize societal implications, with papers spanning multiple external domains achieving the highest internal impact. These insights advance understanding of how NLP research travels from conferences to broader societal and technological ecosystems, offering a nuanced view of what ACL-type conferences can signify about AI research trajectories.

Abstract

We investigate the impacts of NLP research published in top-tier conferences (i.e., ACL, EMNLP, and NAACL) from 1979 to 2024. By analyzing citations from research articles and external sources such as patents, media, and policy documents, we examine how different NLP topics are consumed both within the academic community and by the broader public. Our findings reveal that language modeling has the widest internal and external influence, while linguistic foundations have lower impacts. We also observe that internal and external impacts generally align, but topics like ethics, bias, and fairness show significant attention in policy documents with much fewer academic citations. Additionally, external domains exhibit distinct preferences, with patents focusing on practical NLP applications and media and policy documents engaging more with the societal implications of NLP models.

Paper Structure

This paper contains 13 sections, 3 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Impact of NLP papers with a certain topic in a certain domain.
  • Figure A1: Impact of NLP papers with a certain topic in the GitHub domain.