Table of Contents
Fetching ...

Fake News Detection After LLM Laundering: Measurement and Explanation

Rupak Kumar Das, Jonathan Dodge

TL;DR

This work investigates fake-news detection when content is generated or paraphrased by large language models. By evaluating a broad suite of detectors against real and LLM-paraphrased news (COVID-19 and LIAR datasets) using multiple paraphrasers (PEGASUS, GPT, Llama), the study reveals that detectors struggle more with paraphrased content, especially Pegasus, while GPT-based paraphrases often preserve semantic similarity as measured by $F_{BERT}$. Explainability via LIME shows sentiment shifts introduced during paraphrasing can drive misclassifications, highlighting a gap between semantic similarity and perceived sentiment. The authors contribute two paraphrase datasets, analyze detector robustness, and discuss the need for sentiment-aware evaluation metrics to improve detection in real-world misinformation scenarios.

Abstract

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub

Fake News Detection After LLM Laundering: Measurement and Explanation

TL;DR

This work investigates fake-news detection when content is generated or paraphrased by large language models. By evaluating a broad suite of detectors against real and LLM-paraphrased news (COVID-19 and LIAR datasets) using multiple paraphrasers (PEGASUS, GPT, Llama), the study reveals that detectors struggle more with paraphrased content, especially Pegasus, while GPT-based paraphrases often preserve semantic similarity as measured by . Explainability via LIME shows sentiment shifts introduced during paraphrasing can drive misclassifications, highlighting a gap between semantic similarity and perceived sentiment. The authors contribute two paraphrase datasets, analyze detector robustness, and discuss the need for sentiment-aware evaluation metrics to improve detection in real-world misinformation scenarios.

Abstract

With their advanced capabilities, Large Language Models (LLMs) can generate highly convincing and contextually relevant fake news, which can contribute to disseminating misinformation. Though there is much research on fake news detection for human-written text, the field of detecting LLM-generated fake news is still under-explored. This research measures the efficacy of detectors in identifying LLM-paraphrased fake news, in particular, determining whether adding a paraphrase step in the detection pipeline helps or impedes detection. This study contributes: (1) Detectors struggle to detect LLM-paraphrased fake news more than human-written text, (2) We find which models excel at which tasks (evading detection, paraphrasing to evade detection, and paraphrasing for semantic similarity). (3) Via LIME explanations, we discovered a possible reason for detection failures: sentiment shift. (4) We discover a worrisome trend for paraphrase quality measurement: samples that exhibit sentiment shift despite a high BERTSCORE. (5) We provide a pair of datasets augmenting existing datasets with paraphrase outputs and scores. The dataset is available on GitHub

Paper Structure

This paper contains 18 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Methodology to assess the efficacy of fake news detectors
  • Figure 2: (Top): Performance of fake news detectors on human-written and LLM-paraphrased text on COVID-19 dataset. (Bottom): Same, but on LIAR dataset
  • Figure 3: Distribution of $F_{BERT}$ score for all paraphrasers on COVID-19 dataset. Higher is better.
  • Figure 4: Distribution of $F_{BERT}$ score for all paraphrasers on LIAR dataset
  • Figure 5: (Top Left): LIME output of the BERT model on human-written news (Top Right): LIME output of the BERT model on Llama-paraphrased news (Bottom Left): LIME output of the LSTM model on human-written news (Bottom Right): LIME output of the LSTM model on GPT-paraphrased news
  • ...and 1 more figures