Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Yida Mu; Xingyi Song; Kalina Bontcheva; Nikolaos Aletras

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Yida Mu, Xingyi Song, Kalina Bontcheva, Nikolaos Aletras

TL;DR

The paper addresses the challenge of generalizing rumor detection to unseen rumors by comparing content-based and context-based models. It systematically evaluates how data-split strategies, especially chronological splits, reveal temporal concept drift and expose overestimation in static-dataset evaluations. Through ablation, similarity, and modality analyses, it finds that context signals are underutilized and that model performance correlates with training-test similarity. The work offers practical recommendations, such as forward/backward chronological splits and data-cleaning approaches, to improve the reliability of rumor detectors in real-world settings.

Abstract

A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors' source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 4 figures, 7 tables)

This paper contains 25 sections, 2 equations, 4 figures, 7 tables.

Introduction
Related Work
Computational Rumor Detection Approaches
The Effect of Temporal Concept Drift in NLP Downstream Tasks
Experimental Setup
Data
Models
Data Pre-processing
Evaluation Metrics
Hyper-parameters
Evaluation Strategies
Data Splits
Results and Discussion
Model Performance on Random Splits
Forward v.s. Backward Chronological Splits
...and 10 more sections

Figures (4)

Figure 1: Two rumor spreaders (in the green box) posted an identical rumor and received different stances of comments (in the gray box), i.e., denial (on the left) and support (on the right), respectively. '[Crying_Face]' denotes the Loudly Crying Face emoji.
Figure 2: An example of using forward and backward chronological data splits on Weibo 20 dataset (including rumors from 2016 to 2020). There is no overlap among the three subsets.
Figure 3: Two rumors from the Sun-MM Dataset related to the 'plane crash' event contain similar images and were published during a comparable time period. More examples are displayed in the Appendix (see Figure \ref{['fig:sun-mm-more-exampels']}).
Figure 4: Four pairs of rumors related to the 'plane crash' event (from the Sun-MM Dataset) contain similar images and were published during a comparable time period.

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

TL;DR

Abstract

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (4)