Table of Contents
Fetching ...

TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs

Kay Liu, Yuwei Han, Haoyan Xu, Henry Peng Zou, Yue Zhao, Philip S. Yu

TL;DR

The paper tackles the lack of large-scale, real-world text-attributed graph datasets for graph-level outlier detection in misinformation. It introduces TAGFN, a dataset with three subsets (Politifact, GossipCop, Fakeddit) that preserve raw news content and user posts, along with propagation graphs and ground-truth labels, enabling rigorous benchmarking of both traditional graph methods and LLM-based detectors. The authors provide baseline experiments across prompting and supervised embedding approaches, plus an ablation study demonstrating the value of the full text-attributed graph. By releasing data and code, TAGFN aims to accelerate research in robust graph-based misinformation detection and trustworthy AI.

Abstract

Large Language Models (LLMs) have recently revolutionized machine learning on text-attributed graphs, but the application of LLMs to graph outlier detection, particularly in the context of fake news detection, remains significantly underexplored. One of the key challenges is the scarcity of large-scale, realistic, and well-annotated datasets that can serve as reliable benchmarks for outlier detection. To bridge this gap, we introduce TAGFN, a large-scale, real-world text-attributed graph dataset for outlier detection, specifically fake news detection. TAGFN enables rigorous evaluation of both traditional and LLM-based graph outlier detection methods. Furthermore, it facilitates the development of misinformation detection capabilities in LLMs through fine-tuning. We anticipate that TAGFN will be a valuable resource for the community, fostering progress in robust graph-based outlier detection and trustworthy AI. The dataset is publicly available at https://huggingface.co/datasets/kayzliu/TAGFN and our code is available at https://github.com/kayzliu/tagfn.

TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs

TL;DR

The paper tackles the lack of large-scale, real-world text-attributed graph datasets for graph-level outlier detection in misinformation. It introduces TAGFN, a dataset with three subsets (Politifact, GossipCop, Fakeddit) that preserve raw news content and user posts, along with propagation graphs and ground-truth labels, enabling rigorous benchmarking of both traditional graph methods and LLM-based detectors. The authors provide baseline experiments across prompting and supervised embedding approaches, plus an ablation study demonstrating the value of the full text-attributed graph. By releasing data and code, TAGFN aims to accelerate research in robust graph-based misinformation detection and trustworthy AI.

Abstract

Large Language Models (LLMs) have recently revolutionized machine learning on text-attributed graphs, but the application of LLMs to graph outlier detection, particularly in the context of fake news detection, remains significantly underexplored. One of the key challenges is the scarcity of large-scale, realistic, and well-annotated datasets that can serve as reliable benchmarks for outlier detection. To bridge this gap, we introduce TAGFN, a large-scale, real-world text-attributed graph dataset for outlier detection, specifically fake news detection. TAGFN enables rigorous evaluation of both traditional and LLM-based graph outlier detection methods. Furthermore, it facilitates the development of misinformation detection capabilities in LLMs through fine-tuning. We anticipate that TAGFN will be a valuable resource for the community, fostering progress in robust graph-based outlier detection and trustworthy AI. The dataset is publicly available at https://huggingface.co/datasets/kayzliu/TAGFN and our code is available at https://github.com/kayzliu/tagfn.

Paper Structure

This paper contains 17 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A toy example of news propagation graph in TAGFN, where the root node denotes the news and child nodes represent users, each attributed with text.
  • Figure 2: Ablation study of one-shot ICL on Politifact.
  • Figure 3: The graph structure of the data instance.

Theorems & Definitions (1)

  • Definition 1: Text-Attributed Graph Outlier Detection