Table of Contents
Fetching ...

Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation

Dominik Macko, Aashish Anantha Ramakrishnan, Jason Samuel Lucas, Robert Moro, Ivan Srba, Adaku Uchendu, Dongwon Lee

TL;DR

The paper tackles the problem of quantifying how often LLM-generated texts appear in multilingual disinformation in real-world data. It introduces two robust, multilingual detectors (Gemma_GenAI and Gemma_MultiDomain) trained with QLoRA and evaluated on diverse benchmarks, then applies them to real-world datasets to estimate prevalence using a combined confident-detection approach. The findings show a detectable rise in LLM-generated content after the introduction of accessible chat-based LLMs, with substantial cross-language and platform variation (e.g., Polish and French showing higher relative prevalence; Telegram and Instagram showing notable levels). The study provides concrete empirical evidence to support concerns about AI-assisted disinformation and emphasizes the need for continued detection improvements and credibility indicators to safeguard information integrity across multilingual online ecosystems.

Abstract

Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggle to distinguish LLM-generated content from human-written texts, the scholarly debate about their impact remains divided. Some argue that heightened fears are overblown due to natural ecosystem limitations, while others contend that specific "longtail" contexts face overlooked risks. Our study bridges this debate by providing the first empirical evidence of LLM presence in the latest real-world disinformation datasets, documenting the increase of machine-generated content following ChatGPT's release, and revealing crucial patterns across languages, platforms, and time periods.

Beyond speculation: Measuring the growing presence of LLM-generated texts in multilingual disinformation

TL;DR

The paper tackles the problem of quantifying how often LLM-generated texts appear in multilingual disinformation in real-world data. It introduces two robust, multilingual detectors (Gemma_GenAI and Gemma_MultiDomain) trained with QLoRA and evaluated on diverse benchmarks, then applies them to real-world datasets to estimate prevalence using a combined confident-detection approach. The findings show a detectable rise in LLM-generated content after the introduction of accessible chat-based LLMs, with substantial cross-language and platform variation (e.g., Polish and French showing higher relative prevalence; Telegram and Instagram showing notable levels). The study provides concrete empirical evidence to support concerns about AI-assisted disinformation and emphasizes the need for continued detection improvements and credibility indicators to safeguard information integrity across multilingual online ecosystems.

Abstract

Increased sophistication of large language models (LLMs) and the consequent quality of generated multilingual text raises concerns about potential disinformation misuse. While humans struggle to distinguish LLM-generated content from human-written texts, the scholarly debate about their impact remains divided. Some argue that heightened fears are overblown due to natural ecosystem limitations, while others contend that specific "longtail" contexts face overlooked risks. Our study bridges this debate by providing the first empirical evidence of LLM presence in the latest real-world disinformation datasets, documenting the increase of machine-generated content following ChatGPT's release, and revealing crucial patterns across languages, platforms, and time periods.

Paper Structure

This paper contains 10 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Per-year Mean Score of the two fine-tuned detectors for MultiClaim texts. Both detectors independently show increasing Mean Scores in recent years.
  • Figure 2: Per-year proportion of the MultiClaim texts detected to be machine-generated. Proportion is increasing in recent years, with the highest increase in 2023 (after ChatGPT release).
  • Figure 3: Left: Per-language proportion of the texts detected to be machine-generated (for languages including enough samples). Proportion differs among languages and is the highest in case of Polish and French. Right: Per-language number of samples showing also the distribution of samples classified as human-written (negative) and machine-generated (positive).
  • Figure 4: Proportion of the texts detected to be machine-generated in different datasets.
  • Figure 5: The per-month proportion of the texts detected to be machine-generated in the USC-X dataset for the election year of 2024 (all languages on the left, English texts only on the right). Proportion is increasing towards the election date in November 2024.
  • ...and 5 more figures