Identifying Super Spreaders in Multilayer Networks
Michał Czuba, Mateusz Stolarski, Adam Piróg, Piotr Bielak, Piotr Bródka
TL;DR
This work addresses identifying super-spreaders in multilayer networks by reframing the task as an inductive ranking problem. It introduces TopSpreadersNetwork (ts-net), a shared-encoder, trainable-aggregation GNN that predicts a four-dimensional spreading-potential vector per actor, enabling end-to-end ranking across networks of varying size and layer complexity. A new TopSpreadersDataset is built from simulations of diffusion under the Multilayer Independent Cascade Model (MICM), capturing spreading dynamics across real and synthetic multilayer graphs. Empirical results show ts-net consistently outperforms classical centrality heuristics and competitive learning baselines while providing interpretable outputs, with strong generalisation to unseen networks. The work also proposes data-transformations and an inductive framework suitable for scalable, real-world diffusion analysis, and outlines future work on broader heterogeneous graphs and seed-set evaluations.
Abstract
Identifying super-spreaders can be framed as a subtask of the influence maximisation problem. It seeks to pinpoint agents within a network that, if selected as single diffusion seeds, disseminate information most effectively. Multilayer networks, a specific class of heterogeneous graphs, can capture diverse types of interactions (e.g., physical-virtual or professional-social), and thus offer a more accurate representation of complex relational structures. In this work, we introduce a novel approach to identifying super-spreaders in such networks by leveraging graph neural networks. To this end, we construct a dataset by simulating information diffusion across hundreds of networks - to the best of our knowledge, the first of its kind tailored specifically to multilayer networks. We further formulate the task as a variation of the ranking prediction problem based on a four-dimensional vector that quantifies each agent's spreading potential: (i) the number of activations; (ii) the duration of the diffusion process; (iii) the peak number of activations; and (iv) the simulation step at which this peak occurs. Our model, TopSpreadersNetwork, comprises a relationship-agnostic encoder and a custom aggregation layer. This design enables generalisation to previously unseen data and adapts to varying graph sizes. In an extensive evaluation, we compare our model against classic centrality-based heuristics and competitive deep learning methods. The results, obtained across a broad spectrum of real-world and synthetic multilayer networks, demonstrate that TopSpreadersNetwork achieves superior performance in identifying high-impact nodes, while also offering improved interpretability through its structured output.
