Table of Contents
Fetching ...

Uncovering Social Network Activity Using Joint User and Topic Interaction

Gaspard Abel, Argyris Kalogeratos, Jean-Pierre Nadal, Julien Randon-Furling

TL;DR

This paper introduces the Mixture of Interacting Cascades (MIC), a model of marked multidimensional Hawkes processes with the capacity to model jointly non-trivial interaction between cascades and users, and uses a mixture of temporal point processes to build a coupled user/cascade point process model.

Abstract

The emergence of online social platforms, such as social networks and social media, has drastically affected the way people apprehend the information flows to which they are exposed. In such platforms, various information cascades spreading among users is the main force creating complex dynamics of opinion formation, each user being characterized by their own behavior adoption mechanism. Moreover, the spread of multiple pieces of information or beliefs in a networked population is rarely uncorrelated. In this paper, we introduce the Mixture of Interacting Cascades (MIC), a model of marked multidimensional Hawkes processes with the capacity to model jointly non-trivial interaction between cascades and users. We emphasize on the interplay between information cascades and user activity, and use a mixture of temporal point processes to build a coupled user/cascade point process model. Experiments on synthetic and real data highlight the benefits of this approach and demonstrate that MIC achieves superior performance to existing methods in modeling the spread of information cascades. Finally, we demonstrate how MIC can provide, through its learned parameters, insightful bi-layered visualizations of real social network activity data.

Uncovering Social Network Activity Using Joint User and Topic Interaction

TL;DR

This paper introduces the Mixture of Interacting Cascades (MIC), a model of marked multidimensional Hawkes processes with the capacity to model jointly non-trivial interaction between cascades and users, and uses a mixture of temporal point processes to build a coupled user/cascade point process model.

Abstract

The emergence of online social platforms, such as social networks and social media, has drastically affected the way people apprehend the information flows to which they are exposed. In such platforms, various information cascades spreading among users is the main force creating complex dynamics of opinion formation, each user being characterized by their own behavior adoption mechanism. Moreover, the spread of multiple pieces of information or beliefs in a networked population is rarely uncorrelated. In this paper, we introduce the Mixture of Interacting Cascades (MIC), a model of marked multidimensional Hawkes processes with the capacity to model jointly non-trivial interaction between cascades and users. We emphasize on the interplay between information cascades and user activity, and use a mixture of temporal point processes to build a coupled user/cascade point process model. Experiments on synthetic and real data highlight the benefits of this approach and demonstrate that MIC achieves superior performance to existing methods in modeling the spread of information cascades. Finally, we demonstrate how MIC can provide, through its learned parameters, insightful bi-layered visualizations of real social network activity data.

Paper Structure

This paper contains 25 sections, 24 equations, 10 figures, 5 tables, 1 algorithm.

Figures (10)

  • Figure 1: Bi-layer scheme for the MIC model. Users interact on the top of a layer of interacting cascades. Node size is proportional to the associated volume of events, and node color depicts the mixture of cascades. MIC encompasses complex patterns of social network activity related to both layers and their interplay, namely by modeling jointly cascade-to-cascade ($\mathbf{\Sigma}$), cascade-to-user ($\mathbf{M}$), and user-to-user ($\mathbf{{W}}$) interactions. Cascade(-to-cascade) interactions are implicit through the event generation process driven by users. Model parameters $\mathbf{\Sigma}$, $\mathbf{{W}}$, $\mathbf{M}$, and the design components $f_u(\cdot)$, $\kappa(\cdot)$ (see Sec. \ref{['sec:model-def']}) appear on the right.
  • Figure 2: Heatmaps of the test log-likelihood ratios between the competitors and MIC, with varying $\beta$ (y-axis) and the cascade interaction $\sigma_{21}$ (x-axis), on synthetic event logs generated by MIC. A ratio value larger than $1$ means that MIC performs better.
  • Figure 3: Number of events and intensity of cascades $c_1$ and $c_3$ over time, for the true $(n^{*(\cdot)}(t),\lambda^{*(\cdot)}(t))$, and the generated events given the learned MIC model $(\Tilde{n}^{(\cdot)}(t),\Tilde{\lambda}^{(\cdot)}(t))$. Error bars correspond to the simulation on $10$ event generations. Dotted lines are the theoretical expected quantities for the MIC model, computed using both the true and the inferred parameter values. The event dataset has been generated with the following parameterization: $\beta=33.37$ and $\mathbf{\Sigma} =[100.71.290001]$.
  • Figure 4: Evaluation of the compared methods when applied on the url dataset (top row, $6$ cascades) and the élysée2017 dataset (bottom row, $5$ cascades), using three measures: (a,d,g) Test log-likelihood for a varying fraction of the initial train dataset (the bottom x-axis shows number of training events per user, and the top x-axis shows the correspondence to percentages). For élysée2017, the left y-axis corresponds to IC, linMIC, while the right y-axis corresponds to CC and MIC (note the difference in the scale between the left and right y-axes). (b,e,h) Inverse $l_1$-distance for the adoption of each cascade, and for the overall intensity. (c,f,i) Pearson correlation between the real and the simulated intensity of each cascade.
  • Figure 5: Evaluation of the compared methods when applied on the lastfm dataset ($50$ cascades) using two measures: (a) Test log-likelihood for a varying size of the initial train dataset (the bottom x-axis shows number of training events per user, and the top x-axis shows the correspondence to percentages). (b) Ranked number of events for each cascade. Real data is compared to the generated events by each of the models.
  • ...and 5 more figures

Theorems & Definitions (1)

  • proof