Table of Contents
Fetching ...

Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats

Sadat Shahriar, Navid Ayoobi, Arjun Mukherjee, Mostafa Musharrat, Sai Vishnu Vamsi

TL;DR

Pink Slime journalism employs template-driven local news that mimics legitimacy, challenging detection. The authors perform fine-grained linguistic analysis to identify simple, less lexical-rich patterns and then develop detectors using handcrafted features and transformer fine-tuning. They show LLM-based rewriting can substantially degrade detection (up to 40% F1 loss) and propose a continual-learning framework to adapt to such drift, achieving meaningful robustness gains (up to ~27%) with limited forgetting. The work provides actionable linguistic cues and a scalable defense against evolving AI-generated misinformation in local news ecosystems.

Abstract

The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimic legitimate local reporting. Detecting these deceptive articles requires a fine-grained analysis of their linguistic, stylistic, and lexical characteristics. In this work, we conduct a comprehensive study to uncover the distinguishing patterns of Pink Slime content and propose detection strategies based on these insights. Beyond traditional generation methods, we highlight a new adversarial vector: modifications through large language models (LLMs). Our findings reveal that even consumer-accessible LLMs can significantly undermine existing detection systems, reducing their performance by up to 40% in F1-score. To counter this threat, we introduce a robust learning framework specifically designed to resist LLM-based adversarial attacks and adapt to the evolving landscape of automated pink slime journalism, and showed and improvement by up to 27%.

Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats

TL;DR

Pink Slime journalism employs template-driven local news that mimics legitimacy, challenging detection. The authors perform fine-grained linguistic analysis to identify simple, less lexical-rich patterns and then develop detectors using handcrafted features and transformer fine-tuning. They show LLM-based rewriting can substantially degrade detection (up to 40% F1 loss) and propose a continual-learning framework to adapt to such drift, achieving meaningful robustness gains (up to ~27%) with limited forgetting. The work provides actionable linguistic cues and a scalable defense against evolving AI-generated misinformation in local news ecosystems.

Abstract

The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimic legitimate local reporting. Detecting these deceptive articles requires a fine-grained analysis of their linguistic, stylistic, and lexical characteristics. In this work, we conduct a comprehensive study to uncover the distinguishing patterns of Pink Slime content and propose detection strategies based on these insights. Beyond traditional generation methods, we highlight a new adversarial vector: modifications through large language models (LLMs). Our findings reveal that even consumer-accessible LLMs can significantly undermine existing detection systems, reducing their performance by up to 40% in F1-score. To counter this threat, we introduce a robust learning framework specifically designed to resist LLM-based adversarial attacks and adapt to the evolving landscape of automated pink slime journalism, and showed and improvement by up to 27%.

Paper Structure

This paper contains 14 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Identical article templates are observed across four different news outlets, each targeting a different U.S. state and date. Only the location-specific statistics are varied. This formulaic, template-based approach is a hallmark of PS journalism, where local-seeming content is mass produced with minimal editorial variation.
  • Figure 2: Comparison of key characteristics between regular local news (LN) and Pink Slime (PS): (a) Number of sentences per article, (b) Proportion of simple sentences, (c) Adjective frequency per 1000 words, (d) Lexical richness (Root Type-Token Ratio, RTTR), (e) Top 5 POS trigram probabilities, and (f) Number of unique noun phrases.
  • Figure 3: t-SNE visualization of Pink Slime (PS), local news (LN), legitimate national news (National:legit), and fake news (National:fake). The dense clusters of PS are also indicated.
  • Figure 4: SHAP summary plots for the top features identified by the XGBoost (a) and Random Forest (b) classifiers. The x-axis shows the SHAP value (impact on model output), and features are ranked by their importance.
  • Figure 5: First, the Model training involves a dataset combining LN and PS articles. To simulate adversarial scenarios, targeted LLM-paraphrased PS articles are introduced. Evaluation is performed on two test sets: the original set with human-written PS articles, and the LLM-modified set where only PS samples are paraphrased, while LN remains unchanged.
  • ...and 2 more figures