Table of Contents
Fetching ...

Limits of Multilayer Diffusion Network Inference in Social Media Research

Yan Xia, Ted Hsuan Yun Chen, Mikko Kivelä

TL;DR

This work examines the limits of inferring a multilayer diffusion network from social-media cascades by systematically varying network structure and diffusion settings in synthetic data. It presents a two-phase inference framework that combines a single-layer aggregation step with a multilayer phase, under a continuous-time diffusion model with exponential transmission times and likelihood-based optimization. Key findings show that inference accuracy strongly depends on network density and cascade size distribution, with high accuracy on dense networks and markedly reduced performance when many cascades reach only a small audience; filtering out small cascades can substantially improve multilayer-layer assignments. The study provides practical guidance for applicability evaluation, demonstrates superior open-source implementation over baselines, and highlights directions for extending models and improving scalability for real-data use.

Abstract

Information on social media spreads through an underlying diffusion network that connects people of common interests and opinions. This diffusion network often comprises multiple layers, each capturing the spreading dynamics of a certain type of information characterized by, for example, topic, language, or attitude. Researchers have previously proposed methods to infer these underlying multilayer diffusion networks from observed spreading patterns, but little is known about how well these methods perform across the range of realistic spreading data. In this paper, we conduct an extensive series of synthetic data experiments to systematically analyze the performance of the multilayer diffusion network inference framework, under varied network structure (e.g. density, number of layers) and information diffusion settings (e.g. cascade size, layer mixing) that are designed to mimic real-world spreading on social media. Our results show extreme performance variation of the inference framework: notably, it achieves much higher accuracy when inferring a denser diffusion network, while it fails to decompose the diffusion network correctly when most cascades in the data reach a limited audience. In demonstrating the conditions under which the inference accuracy is extremely low, our paper highlights the need to carefully evaluate the applicability of the inference before running it on real data. Practically, our results serve as a reference for this evaluation, and our publicly available implementation, which outperforms previous implementations in accuracy, supports further testing under personalized settings.

Limits of Multilayer Diffusion Network Inference in Social Media Research

TL;DR

This work examines the limits of inferring a multilayer diffusion network from social-media cascades by systematically varying network structure and diffusion settings in synthetic data. It presents a two-phase inference framework that combines a single-layer aggregation step with a multilayer phase, under a continuous-time diffusion model with exponential transmission times and likelihood-based optimization. Key findings show that inference accuracy strongly depends on network density and cascade size distribution, with high accuracy on dense networks and markedly reduced performance when many cascades reach only a small audience; filtering out small cascades can substantially improve multilayer-layer assignments. The study provides practical guidance for applicability evaluation, demonstrates superior open-source implementation over baselines, and highlights directions for extending models and improving scalability for real-data use.

Abstract

Information on social media spreads through an underlying diffusion network that connects people of common interests and opinions. This diffusion network often comprises multiple layers, each capturing the spreading dynamics of a certain type of information characterized by, for example, topic, language, or attitude. Researchers have previously proposed methods to infer these underlying multilayer diffusion networks from observed spreading patterns, but little is known about how well these methods perform across the range of realistic spreading data. In this paper, we conduct an extensive series of synthetic data experiments to systematically analyze the performance of the multilayer diffusion network inference framework, under varied network structure (e.g. density, number of layers) and information diffusion settings (e.g. cascade size, layer mixing) that are designed to mimic real-world spreading on social media. Our results show extreme performance variation of the inference framework: notably, it achieves much higher accuracy when inferring a denser diffusion network, while it fails to decompose the diffusion network correctly when most cascades in the data reach a limited audience. In demonstrating the conditions under which the inference accuracy is extremely low, our paper highlights the need to carefully evaluate the applicability of the inference before running it on real data. Practically, our results serve as a reference for this evaluation, and our publicly available implementation, which outperforms previous implementations in accuracy, supports further testing under personalized settings.

Paper Structure

This paper contains 27 sections, 15 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Cascade size distributions of the two real-world datasets in log-log scale. Plot a) corresponds to the ClimateSkepticCascades dataset, and plot b) corresponds to the PoliSciCascades dataset.
  • Figure 2: Cascade size distributions of synthetic cascades generated under different $\gamma$ settings.
  • Figure 3: Inference accuracy of MultiC compared with MixCascades, MMRate, and FASTEN, on synthetic data generated respectively under the setting of (a) MultiC, $K=2$, (b) MultiC, $K=3$, (c) MultiC, $K=4$, (d) FASTEN, random, (e) FASTEN, hierarchical, and (f) FASTEN, core-periphery.
  • Figure 4: Experimental results of inference accuracy varied with respectively (a) cascade size distribution, (b) cascade filtering, (c) network density, (d) network size, (e) number of layers, (f) layer overlap, and (g) level of mixed spreading.