Table of Contents
Fetching ...

The Rise of AI-Generated Content in Wikipedia

Creston Brooks, Samuel Eggert, Denis Peskoff

TL;DR

This work investigates the prevalence and characteristics of AI-generated content in Wikipedia using two detectors, GPTZero and Binoculars, to establish a lower-bound estimate on AI-written articles. By comparing August 2024 new English pages with a pre-March 2022 baseline and calibrating detectors to a $1\%$ FPR, it finds a nontrivial share of AI-influenced content, particularly in English, with lower shares in German, French, and Italian. The study characterizes flagged articles along lines of quality, promotional content, polarization, translation, and writing-tool usage, and extends the discussion to Reddit and UN press releases to contextualize domain-specific dynamics. It highlights that AI-generated content can degrade reliability if overrepresented, while also acknowledging legitimate uses as writing aids and translations, emphasizing the need for robust, multilingual detection and governance. The findings have implications for training data integrity, model evaluation, and the long-term viability of AI-sourced content for large-scale language models.

Abstract

The rise of AI-generated content in popular information sources raises significant concerns about accountability, accuracy, and bias amplification. Beyond directly impacting consumers, the widespread presence of this content poses questions for the long-term viability of training language models on vast internet sweeps. We use GPTZero, a proprietary AI detector, and Binoculars, an open-source alternative, to establish lower bounds on the presence of AI-generated content in recently created Wikipedia pages. Both detectors reveal a marked increase in AI-generated content in recent pages compared to those from before the release of GPT-3.5. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated, with lower percentages for German, French, and Italian articles. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint on controversial topics.

The Rise of AI-Generated Content in Wikipedia

TL;DR

This work investigates the prevalence and characteristics of AI-generated content in Wikipedia using two detectors, GPTZero and Binoculars, to establish a lower-bound estimate on AI-written articles. By comparing August 2024 new English pages with a pre-March 2022 baseline and calibrating detectors to a FPR, it finds a nontrivial share of AI-influenced content, particularly in English, with lower shares in German, French, and Italian. The study characterizes flagged articles along lines of quality, promotional content, polarization, translation, and writing-tool usage, and extends the discussion to Reddit and UN press releases to contextualize domain-specific dynamics. It highlights that AI-generated content can degrade reliability if overrepresented, while also acknowledging legitimate uses as writing aids and translations, emphasizing the need for robust, multilingual detection and governance. The findings have implications for training data integrity, model evaluation, and the long-term viability of AI-sourced content for large-scale language models.

Abstract

The rise of AI-generated content in popular information sources raises significant concerns about accountability, accuracy, and bias amplification. Beyond directly impacting consumers, the widespread presence of this content poses questions for the long-term viability of training language models on vast internet sweeps. We use GPTZero, a proprietary AI detector, and Binoculars, an open-source alternative, to establish lower bounds on the presence of AI-generated content in recently created Wikipedia pages. Both detectors reveal a marked increase in AI-generated content in recent pages compared to those from before the release of GPT-3.5. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated, with lower percentages for German, French, and Italian articles. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint on controversial topics.

Paper Structure

This paper contains 21 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Using two tools, GPTZero and Binoculars, we detect that as many as 5% of 2,909 English Wikipedia articles created in August 2024 contain significant AI-generated content. The classification thresholds of both tools were calibrated to maintain a FPR of no more than 1% on a pre-GPT-3.5 Wikipedia baseline, as indicated by the red line.
  • Figure 2: The activity of this user, who was flagged for instigating an 'Edit War,' reveals that within a single day, they created three articles (red border), all identified as AI-generated. Notably, at 13:00 (green border), the user edited the outcome of 'War in Dibra' from 'Mixed Results' to 'Victory' and removed key text, just an hour before creating a new page titled 'Uprising in Dibra.' That page (see \ref{['fig:albania']}) has since been deleted by moderators.
  • Figure 3: Wikipedia page flagged as AI-generated and deleted by moderators.
  • Figure 4: GPTZero scores compared to the number of page edits for English (left) and French (right) articles created before March 2022. Pages with more edits in English receive higher GPTZero scores.
  • Figure 5: GPTZero scores compared to the number of page edits for Italian (left) and German (right) articles created before March 2022.