Revisiting Information Cascades in Online Social Networks
Michael Sidorov, Dan Vilenchik
TL;DR
The paper reevaluates how information cascades unfold in online social networks by predicting user reactions to posts using only historical activity, without relying on linguistic features. It introduces four models—TWPN, MLE, TWMN, and TWCRN—to test whether social links are essential or learnable, with TWCRN achieving the best average $F_1$ around $0.86$ on four Twitter datasets from 2020. A key finding is that a simple per-user habit model (MLE) performs competitively, and that a convolutional residual network can implicitly learn network structure even when not provided explicitly. The study contributes an open dataset and code, demonstrates that social-link information can be learned rather than hard-coded, and highlights differences between follower- and mention-based graphs in predictive power, offering a nuanced view of information diffusion beyond epidemic-like models.
Abstract
It's by now folklore that to understand the activity pattern of a user in an online social network (OSN) platform, one needs to look at his friends or the ones he follows. The common perception is that these friends exert influence on the user, effecting his decision whether to re-share content or not. Hinging upon this intuition, a variety of models were developed to predict how information propagates in OSN, similar to the way infection spreads in the population. In this paper, we revisit this world view and arrive at new conclusions. Given a set of users $V$, we study the task of predicting whether a user $u \in V$ will re-share content by some $v \in V$ at the following time window given the activity of all the users in $V$ in the previous time window. We design several algorithms for this task, ranging from a simple greedy algorithm that only learns $u$'s conditional probability distribution, ignoring the rest of $V$, to a convolutional neural network-based algorithm that receives the activity of all of $V$, but does not receive explicitly the social link structure. We tested our algorithms on four datasets that we collected from Twitter, each revolving around a different popular topic in 2020. The best performance, average F1-score of 0.86 over the four datasets, was achieved by the convolutional neural network. The simple, social-link ignorant, algorithm achieved an average F1-score of 0.78.
