MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Lionel Z. Wang; Yiming Ma; Renfei Gao; Beichen Guo; Han Zhu; Wenqi Fan; Zexin Lu; Ka Chung Ng

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Lionel Z. Wang, Yiming Ma, Renfei Gao, Beichen Guo, Han Zhu, Wenqi Fan, Zexin Lu, Ka Chung Ng

TL;DR

This study analyzes the creation of fake news from a social psychology perspective and develops a comprehensive LLM-based theoretical framework, LLM-Fake Theory, which introduces a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation.

Abstract

The advent of large language models (LLMs) has revolutionized online content creation, making it much easier to generate high-quality fake news. This misuse threatens the integrity of our digital environment and ethical standards. Therefore, understanding the motivations and mechanisms behind LLM-generated fake news is crucial. In this study, we analyze the creation of fake news from a social psychology perspective and develop a comprehensive LLM-based theoretical framework, LLM-Fake Theory. We introduce a novel pipeline that automates the generation of fake news using LLMs, thereby eliminating the need for manual annotation. Utilizing this pipeline, we create a theoretically informed Machine-generated Fake news dataset, MegaFake, derived from the GossipCop dataset. We conduct comprehensive analyses to evaluate our MegaFake dataset. We believe that our dataset and insights will provide valuable contributions to future research focused on the detection and governance of fake news in the era of LLMs.

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

TL;DR

Abstract

Paper Structure (50 sections, 37 figures, 8 tables)

This paper contains 50 sections, 37 figures, 8 tables.

Data Card
Dataset Description
Meta Information
Dataset Structure
Dataset Creation
Considerations for Using the Data
Author Statement
Hosting, Licensing, and Maintenance Plan
Hosting Platform
Choice of Hosting Platform: Google Drive
Licensing
Data Licensing
Maintenance Plan
Regular Updates and Backups
Curated Interface
...and 35 more sections

Figures (37)

Figure 1: Results for Different Topic Numbers (Information Blending)
Figure 2: Document Matching Process (Information Blending)
Figure 3: Document Matching Process (News Summarization)
Figure 4: Results for Different Topic Numbers (News Summarization)
Figure 5: Results for Different Temperatures (Writing Enhancement)
...and 32 more figures

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

TL;DR

Abstract

MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (37)