PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
Tuan Nguyen, Naseem Khan, Khang Tran, NhatHai Phan, Issa Khalil
TL;DR
This work tackles the challenge of reliable deepfake detection with trustworthy explanations by introducing the DF-R5 reasoning dataset, a DX-LLaVA multimodal architecture, and Paragraph-level Relative Policy Optimization (PRPO) for test-time reinforcement learning. PRPO aligns LLM-generated reasoning with visual evidence at the paragraph level using Visual Consistency and Prediction Consistency rewards, enabling detailed, evidence-grounded explanations without extensive labeled retraining. Empirical results show substantial gains in detection accuracy and explanation quality, including strong generalization to unseen domains and a high reasoning score, outperforming GRPO and other baselines. The approach advances safe, interpretable multimodal detection by tightly grounding reasoning in perceptual cues and robust policy optimization.
Abstract
The rapid rise of synthetic media has made deepfake detection a critical challenge for online safety and trust. Progress remains constrained by the scarcity of large, high-quality datasets. Although multimodal large language models (LLMs) exhibit strong reasoning capabilities, their performance on deepfake detection is poor, often producing explanations that are misaligned with visual evidence or hallucinatory. To address this limitation, we introduce a reasoning-annotated dataset for deepfake detection and propose Paragraph-level Relative Policy Optimization (PRPO), a reinforcement learning algorithm that aligns LLM reasoning with image content at the paragraph level. Experiments show that PRPO improves detection accuracy by a wide margin and achieves the highest reasoning score of 4.55/5.0. Ablation studies further demonstrate that PRPO significantly outperforms GRPO under test-time conditions. These results underscore the importance of grounding multimodal reasoning in visual evidence to enable more reliable and interpretable deepfake detection.
