Table of Contents
Fetching ...

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, Karën Fort

TL;DR

The paper quantifies the presence and influence of Big Tech in NLP by combining manual CV annotations from ACL 2022 with large-scale automatic analysis across the ACL Anthology (1965–2022). It finds a dramatic, ongoing rise in industry-affiliated NLP publications, with Microsoft, Alphabet, and Meta among the leading actors, strong industry–university collaborations, and notable geographic concentration in the US and China. The authors map industry research priorities, collaboration patterns, and citation impact, and they discuss societal and scientific implications, advocating for transparency and shared infrastructure to mitigate potential biases and monopolization. This work provides a data-driven baseline to inform policy, governance, and future research practices in NLP.

Abstract

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.

The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research

TL;DR

The paper quantifies the presence and influence of Big Tech in NLP by combining manual CV annotations from ACL 2022 with large-scale automatic analysis across the ACL Anthology (1965–2022). It finds a dramatic, ongoing rise in industry-affiliated NLP publications, with Microsoft, Alphabet, and Meta among the leading actors, strong industry–university collaborations, and notable geographic concentration in the US and China. The authors map industry research priorities, collaboration patterns, and citation impact, and they discuss societal and scientific implications, advocating for transparency and shared infrastructure to mitigate potential biases and monopolization. This work provides a data-driven baseline to inform policy, governance, and future research practices in NLP.

Abstract

Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.
Paper Structure (32 sections, 9 figures, 8 tables)

This paper contains 32 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Proportion of papers in the ACL anthology with at least one industry affiliation (1995 - 2022).
  • Figure 2: The number of authors with industry affiliation (orange) and the number of grants awarded from each company (blue) evaluated in the manual analysis. Affiliations with less than 5 authors are withheld for privacy.
  • Figure 3: The relative number of papers with industry author affiliation (y-axis and in log-scale) for the top 10 companies (top) and universities (bottom) with the most papers by year. The line shows a moving average ($k=5$) of the number of papers with at least one author affiliation to that company/university divided by the total amount of papers in that year.
  • Figure 4: The percentage of papers with industry affiliation by country. Grey: no industry affiliations, light to dark blue: [0--1]%, orange to red: [1--12]% of papers.
  • Figure 5: A heat map showing the number of papers in various areas by industry authors. The coloring from yellow to dark blue represents the number of papers (in log scale). This analysis was done on the 15 most common topics and 30 companies with the most papers. The topics are listed in descending order of their prevalence.
  • ...and 4 more figures