Enhancing Fake News Video Detection via LLM-Driven Creative Process Simulation
Yuyan Bu, Qiang Sheng, Juan Cao, Shaofei Wang, Peng Qi, Yuhui Shi, Beizhe Hu
TL;DR
Short-video fake-news detectors suffer from limited training data and biased pattern learning due to complex many-to-many relationships between source materials and fabricated events. This work proposes AgentAug, an LLM-driven data synthesis framework with four agentic fabrication pipelines and an active learning sampler to generate diverse fake-news videos and select informative samples during training. Across FakeSV and FakeTT, AgentAug consistently improves multiple detectors, with strongest gains for weaker models and when all fabrication types are used. The approach demonstrates how controlled synthetic data can broaden learning of manipulation patterns while maintaining ethical safeguards.
Abstract
The emergence of fake news on short video platforms has become a new significant societal concern, necessitating automatic video-news-specific detection. Current detectors primarily rely on pattern-based features to separate fake news videos from real ones. However, limited and less diversified training data lead to biased patterns and hinder their performance. This weakness stems from the complex many-to-many relationships between video material segments and fabricated news events in real-world scenarios: a single video clip can be utilized in multiple ways to create different fake narratives, while a single fabricated event often combines multiple distinct video segments. However, existing datasets do not adequately reflect such relationships due to the difficulty of collecting and annotating large-scale real-world data, resulting in sparse coverage and non-comprehensive learning of the characteristics of potential fake news video creation. To address this issue, we propose a data augmentation framework, AgentAug, that generates diverse fake news videos by simulating typical creative processes. AgentAug implements multiple LLM-driven pipelines of four fabrication categories for news video creation, combined with an active learning strategy based on uncertainty sampling to select the potentially useful augmented samples during training. Experimental results on two benchmark datasets demonstrate that AgentAug consistently improves the performance of short video fake news detectors.
