Table of Contents
Fetching ...

Understanding "Democratization" in NLP and ML Research

Arjun Subramonian, Vagrant Gautam, Dietrich Klakow, Zeerak Talat

TL;DR

The paper analyzes how NLP/ML research uses the terms 'democracy' and 'democratization' across major venues, revealing a dominant emphasis on access and cost reduction rather than engagement with democratic theory. Using a large-scale mixed-methods approach on a corpus of 1,537 papers (final 506 papers, 916 excerpts), it shows that most democratization references signal easier access, with minimal interdisciplinary or theoretical grounding. Where democratic theories are engaged, it is usually shallow or confined to a few papers that cite political science and economics literature for methods or results. The authors argue for explicit definitions and theory-informed usage of democratization, and they recommend using 'access' when appropriate or rigorously grounding democratization claims in established democratic theories to avoid misrepresenting public control and power distribution in AI technologies.

Abstract

Recent improvements in natural language processing (NLP) and machine learning (ML) and increased mainstream adoption have led to researchers frequently discussing the "democratization" of artificial intelligence. In this paper, we seek to clarify how democratization is understood in NLP and ML publications, through large-scale mixed-methods analyses of papers using the keyword "democra*" published in NLP and adjacent venues. We find that democratization is most frequently used to convey (ease of) access to or use of technologies, without meaningfully engaging with theories of democratization, while research using other invocations of "democra*" tends to be grounded in theories of deliberation and debate. Based on our findings, we call for researchers to enrich their use of the term democratization with appropriate theory, towards democratic technologies beyond superficial access.

Understanding "Democratization" in NLP and ML Research

TL;DR

The paper analyzes how NLP/ML research uses the terms 'democracy' and 'democratization' across major venues, revealing a dominant emphasis on access and cost reduction rather than engagement with democratic theory. Using a large-scale mixed-methods approach on a corpus of 1,537 papers (final 506 papers, 916 excerpts), it shows that most democratization references signal easier access, with minimal interdisciplinary or theoretical grounding. Where democratic theories are engaged, it is usually shallow or confined to a few papers that cite political science and economics literature for methods or results. The authors argue for explicit definitions and theory-informed usage of democratization, and they recommend using 'access' when appropriate or rigorously grounding democratization claims in established democratic theories to avoid misrepresenting public control and power distribution in AI technologies.

Abstract

Recent improvements in natural language processing (NLP) and machine learning (ML) and increased mainstream adoption have led to researchers frequently discussing the "democratization" of artificial intelligence. In this paper, we seek to clarify how democratization is understood in NLP and ML publications, through large-scale mixed-methods analyses of papers using the keyword "democra*" published in NLP and adjacent venues. We find that democratization is most frequently used to convey (ease of) access to or use of technologies, without meaningfully engaging with theories of democratization, while research using other invocations of "democra*" tends to be grounded in theories of deliberation and debate. Based on our findings, we call for researchers to enrich their use of the term democratization with appropriate theory, towards democratic technologies beyond superficial access.
Paper Structure (36 sections, 6 figures, 6 tables)

This paper contains 36 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Frequency of mentions of democracy ($>$0) per paper in all work published in the ACL Anthology, ICLR, ICML, or NeurIPS before November 24, 2023. 76.1% of papers only mention democracy once.
  • Figure 2: Frequency of values, split by democratization papers and all other papers. Associations with democratization (top) are different from associations with all other mentions of democracy (bottom).
  • Figure 3: Frequency of paper sections in which mentions of democracy occur.
  • Figure 4: Proportion of fields of study of references cited by papers that mention democracy.
  • Figure 5: PCA and spectral clustering of excerpt embeddings, along with selected papers. Points that are the same color belong to the same cluster.
  • ...and 1 more figures