Crowd Intelligence for Early Misinformation Prediction on Social Media
Megha Sundriyal, Harshit Choudhary, Tanmoy Chakraborty, Md Shad Akhtar
TL;DR
This work tackles the problem of predicting misinformation on social media at early stages by leveraging crowd intelligence in the form of user reply stances and claims. It introduces CrowdShield, a hybrid model that combines a deep Q-network to capture thread propagation dynamics with a transformer-based encoder for semantic understanding, integrated via a joint feature vector for final misinformation prediction. The authors also present MisT, a manually annotated Twitter corpus with veracity, stance, and claim labels, enabling robust evaluation; through extensive experiments and ablation studies, CrowdShield outperforms strong baselines and demonstrates strong early-detection capability. The study highlights the practical value of crowd-informed signals for prioritizing fact-checking and guiding automated detection, with future work aimed at multilingual and multimodal extensions and expanded datasets.
Abstract
Misinformation spreads rapidly on social media, causing serious damage by influencing public opinion, promoting dangerous behavior, or eroding trust in reliable sources. It spreads too fast for traditional fact-checking, stressing the need for predictive methods. We introduce CROWDSHIELD, a crowd intelligence-based method for early misinformation prediction. We hypothesize that the crowd's reactions to misinformation reveal its accuracy. Furthermore, we hinge upon exaggerated assertions/claims and replies with particular positions/stances on the source post within a conversation thread. We employ Q-learning to capture the two dimensions -- stances and claims. We utilize deep Q-learning due to its proficiency in navigating complex decision spaces and effectively learning network properties. Additionally, we use a transformer-based encoder to develop a comprehensive understanding of both content and context. This multifaceted approach helps ensure the model pays attention to user interaction and stays anchored in the communication's content. We propose MIST, a manually annotated misinformation detection Twitter corpus comprising nearly 200 conversation threads with more than 14K replies. In experiments, CROWDSHIELD outperformed ten baseline systems, achieving an improvement of ~4% macro-F1 score. We conduct an ablation study and error analysis to validate our proposed model's performance. The source code and dataset are available at https://github.com/LCS2-IIITD/CrowdShield.git.
