Table of Contents
Fetching ...

Crowd Intelligence for Early Misinformation Prediction on Social Media

Megha Sundriyal, Harshit Choudhary, Tanmoy Chakraborty, Md Shad Akhtar

TL;DR

This work tackles the problem of predicting misinformation on social media at early stages by leveraging crowd intelligence in the form of user reply stances and claims. It introduces CrowdShield, a hybrid model that combines a deep Q-network to capture thread propagation dynamics with a transformer-based encoder for semantic understanding, integrated via a joint feature vector for final misinformation prediction. The authors also present MisT, a manually annotated Twitter corpus with veracity, stance, and claim labels, enabling robust evaluation; through extensive experiments and ablation studies, CrowdShield outperforms strong baselines and demonstrates strong early-detection capability. The study highlights the practical value of crowd-informed signals for prioritizing fact-checking and guiding automated detection, with future work aimed at multilingual and multimodal extensions and expanded datasets.

Abstract

Misinformation spreads rapidly on social media, causing serious damage by influencing public opinion, promoting dangerous behavior, or eroding trust in reliable sources. It spreads too fast for traditional fact-checking, stressing the need for predictive methods. We introduce CROWDSHIELD, a crowd intelligence-based method for early misinformation prediction. We hypothesize that the crowd's reactions to misinformation reveal its accuracy. Furthermore, we hinge upon exaggerated assertions/claims and replies with particular positions/stances on the source post within a conversation thread. We employ Q-learning to capture the two dimensions -- stances and claims. We utilize deep Q-learning due to its proficiency in navigating complex decision spaces and effectively learning network properties. Additionally, we use a transformer-based encoder to develop a comprehensive understanding of both content and context. This multifaceted approach helps ensure the model pays attention to user interaction and stays anchored in the communication's content. We propose MIST, a manually annotated misinformation detection Twitter corpus comprising nearly 200 conversation threads with more than 14K replies. In experiments, CROWDSHIELD outperformed ten baseline systems, achieving an improvement of ~4% macro-F1 score. We conduct an ablation study and error analysis to validate our proposed model's performance. The source code and dataset are available at https://github.com/LCS2-IIITD/CrowdShield.git.

Crowd Intelligence for Early Misinformation Prediction on Social Media

TL;DR

This work tackles the problem of predicting misinformation on social media at early stages by leveraging crowd intelligence in the form of user reply stances and claims. It introduces CrowdShield, a hybrid model that combines a deep Q-network to capture thread propagation dynamics with a transformer-based encoder for semantic understanding, integrated via a joint feature vector for final misinformation prediction. The authors also present MisT, a manually annotated Twitter corpus with veracity, stance, and claim labels, enabling robust evaluation; through extensive experiments and ablation studies, CrowdShield outperforms strong baselines and demonstrates strong early-detection capability. The study highlights the practical value of crowd-informed signals for prioritizing fact-checking and guiding automated detection, with future work aimed at multilingual and multimodal extensions and expanded datasets.

Abstract

Misinformation spreads rapidly on social media, causing serious damage by influencing public opinion, promoting dangerous behavior, or eroding trust in reliable sources. It spreads too fast for traditional fact-checking, stressing the need for predictive methods. We introduce CROWDSHIELD, a crowd intelligence-based method for early misinformation prediction. We hypothesize that the crowd's reactions to misinformation reveal its accuracy. Furthermore, we hinge upon exaggerated assertions/claims and replies with particular positions/stances on the source post within a conversation thread. We employ Q-learning to capture the two dimensions -- stances and claims. We utilize deep Q-learning due to its proficiency in navigating complex decision spaces and effectively learning network properties. Additionally, we use a transformer-based encoder to develop a comprehensive understanding of both content and context. This multifaceted approach helps ensure the model pays attention to user interaction and stays anchored in the communication's content. We propose MIST, a manually annotated misinformation detection Twitter corpus comprising nearly 200 conversation threads with more than 14K replies. In experiments, CROWDSHIELD outperformed ten baseline systems, achieving an improvement of ~4% macro-F1 score. We conduct an ablation study and error analysis to validate our proposed model's performance. The source code and dataset are available at https://github.com/LCS2-IIITD/CrowdShield.git.
Paper Structure (26 sections, 16 equations, 5 figures, 7 tables)

This paper contains 26 sections, 16 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Illustrative examples of Twitter posts and their subsequent conversational thread, with the original post containing false and true information, respectively. The hierarchical arrangement of replies reflects users' interaction and stances toward the source post. Replies highlighted in yellow denote claims made by the users in response to the source post.
  • Figure 2: Analysis of stance evolution in replies within the conversation threads. The diagram depicts the change of stance from reply $r_i$ (shown on the left vertical axis) to subsequent reply $r_j$ (shown on the right vertical axis) in conversation threads.
  • Figure 3: Correlation between different stances towards source posts and their assertive nature, as indicated by claim labels, across all conversation threads. The purple bar indicates whether the response is a non-claim, whereas the pink bar indicates whether it is a claim.
  • Figure 4: Illustrative model diagram for our proposed framework for early misinformation prediction. The right side of the diagram shows the Q-table update mechanism. S, D, Q, and C in Q-table denote support, deny, query, and comment, respectively.
  • Figure 5: Macro-F1 scores are presented for our model CrowdShield (indicated by the violet bar with vertical lines) compared to the top baseline systems include conversation threads: ACLR (represented by the green bar with dots) and BERT (depicted by the red bar with diagonal lines). The evaluation was conducted across varying numbers of replies within the conversation thread.