Exploring Semantic Perturbations on Grover

Ziqing Ji; Pranav Kulkarni; Marko Neskovic; Kevin Nolan; Yan Xu

Exploring Semantic Perturbations on Grover

Ziqing Ji, Pranav Kulkarni, Marko Neskovic, Kevin Nolan, Yan Xu

TL;DR

This work investigates the robustness of Grover, a neural fake news detector and generator, to adversarial perturbations across word, sentence, and discourse levels. It combines uninformed perturbations (Frankenstein articles, length, position, subjectivity, word substitutions) with informed perturbations (GPT‑2 substitutions, embedding‑space perturbations) to map Grover’s vulnerabilities and resilience. Key findings show that while some perturbations (synonym substitution, insertion, position) have limited impact, targeted strategies such as GPT‑2 substitutions and embedding perturbations can meaningfully degrade detection, underscoring the need for stronger defenses. The study provides practical insights for improving robustness, shares a TensorFlow 2 migration, and outlines future work including exploring larger datasets and other architectures (e.g., BERT, GPT‑3) for enhanced resilience against neural fake news attacks.

Abstract

With news and information being as easy to access as they currently are, it is more important than ever to ensure that people are not mislead by what they read. Recently, the rise of neural fake news (AI-generated fake news) and its demonstrated effectiveness at fooling humans has prompted the development of models to detect it. One such model is the Grover model, which can both detect neural fake news to prevent it, and generate it to demonstrate how a model could be misused to fool human readers. In this work we explore the Grover model's fake news detection capabilities by performing targeted attacks through perturbations on input news articles. Through this we test Grover's resilience to these adversarial attacks and expose some potential vulnerabilities which should be addressed in further iterations to ensure it can detect all types of fake news accurately.

Exploring Semantic Perturbations on Grover

TL;DR

Abstract

Paper Structure (23 sections, 13 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 13 figures, 1 table, 1 algorithm.

Introduction and Literature
The Rise of Machine-Generated Fake News
An Algorithmic Response
GPT-2 and Grover
Vulnerabilities in Fake News Detection
Getting Started
Setting up Grover
Selection of Grover Model
Analysis of Large Grover Model
Uninformed Perturbations
"Frankenstein" Articles
Blending Human-written and Machine-written Articles
Blending Articles of the Same Classification
Comparing Insertion and Substitution
Effect of Position
...and 8 more sections

Figures (13)

Figure 1: Blending Human and Machine-written Articles, real-fake
Figure 2: Blending articles with same classification, real-real
Figure 3: Blending articles with same classification, fake-fake
Figure 4: Insertion instead of Substitution
Figure 5: Effect of Position
...and 8 more figures

Exploring Semantic Perturbations on Grover

TL;DR

Abstract

Exploring Semantic Perturbations on Grover

Authors

TL;DR

Abstract

Table of Contents

Figures (13)