Table of Contents
Fetching ...

Identifying Super Spreaders in Multilayer Networks

Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka

TL;DR

This work addresses identifying super-spreaders in multilayer networks by reframing the task as an inductive ranking problem. It introduces TopSpreadersNetwork (ts-net), a shared-encoder, trainable-aggregation GNN that predicts a four-dimensional spreading-potential vector per actor, enabling end-to-end ranking across networks of varying size and layer complexity. A new TopSpreadersDataset is built from simulations of diffusion under the Multilayer Independent Cascade Model (MICM), capturing spreading dynamics across real and synthetic multilayer graphs. Empirical results show ts-net consistently outperforms classical centrality heuristics and competitive learning baselines while providing interpretable outputs, with strong generalisation to unseen networks. The work also proposes data-transformations and an inductive framework suitable for scalable, real-world diffusion analysis, and outlines future work on broader heterogeneous graphs and seed-set evaluations.

Abstract

Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent's spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.

Identifying Super Spreaders in Multilayer Networks

TL;DR

This work addresses identifying super-spreaders in multilayer networks by reframing the task as an inductive ranking problem. It introduces TopSpreadersNetwork (ts-net), a shared-encoder, trainable-aggregation GNN that predicts a four-dimensional spreading-potential vector per actor, enabling end-to-end ranking across networks of varying size and layer complexity. A new TopSpreadersDataset is built from simulations of diffusion under the Multilayer Independent Cascade Model (MICM), capturing spreading dynamics across real and synthetic multilayer graphs. Empirical results show ts-net consistently outperforms classical centrality heuristics and competitive learning baselines while providing interpretable outputs, with strong generalisation to unseen networks. The work also proposes data-transformations and an inductive framework suitable for scalable, real-world diffusion analysis, and outlines future work on broader heterogeneous graphs and seed-set evaluations.

Abstract

Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent's spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.

Paper Structure

This paper contains 33 sections, 8 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Distribution of $sps$ in network-72 from artificial-er split, obtained via simulations under $\delta=AND$ (\ref{['fig:score_distribution_and']}) and $\delta=OR$ (\ref{['fig:score_distribution_or']}). Dashed lines indicate the cutoff for the most significant spreaders.
  • Figure 2: A schematic illustration of the ts-net architecture processing a multilayer network. Each layer is encoded independently using a shared encoder composed of interleaved GAT and GIN blocks. The resulting representations are then aggregated by a trainable aggregation layer to produce actor embeddings. Finally, a vector of spreading potential is generated using a component consisting of multilayer perceptrons.
  • Figure 3: The pipeline used to train the ts-net; orange blocks represent data entities, grey blocks indicate functional components, and the model is highlighted in yellow. Note that while the model is trained to optimise $\mathbf{\hat{R}}$, it is still capable of predicting $\mathbf{p}$.
  • Figure 4: Averaged cumulated relative score curves from prediction on test artificial (\ref{['fig:avg_curves_artificial']}) and real-world (\ref{['fig:avg_curves_real']}) networks by the evaluated methods.
  • Figure 5: Inference time of ts-net compared to that of deg-c.

Theorems & Definitions (9)

  • Definition 1: Multilayer network
  • Definition 2: Ranking predictor
  • Definition 3: Top-spreaders ranking
  • Definition 4: Cumulated spreading score
  • Definition 5: Relative cumulated spreading score
  • Definition 6: Top-spreaders identification problem
  • Definition 7: Spreading-potential vector
  • Definition 8: Spreading-potential score
  • Definition 9: WiseAverage fusion