Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets
Lingling Zhang, Hong Jiang, Ye Yuan, Guoren Wang
TL;DR
This work addresses influence maximization on hypergraphs, identifying inefficiencies in existing vertex-based and hyperedge-based methods. It introduces HyperIM, a stratified-sampling framework that differentiates vertex roles by hyperedge structure and employs Binomial- and Poisson-based sampling to generate RR sets more efficiently, while preserving approximation guarantees. The authors also propose HyperIM_BRR to tighten bounds and further reduce the number of RR sets. Theoretical analysis shows improved influence spread, sampling efficiency, and reduced runtime, and experiments on real-world hypergraphs demonstrate substantial gains (up to 2.73× in influence and orders-of-magnitude reductions in RR-set counts and running time) over state-of-the-art baselines. These contributions offer a practical, scalable approach for IM in high-order networks with strong theoretical guarantees and broad applicability.
Abstract
Given a hypergraph, influence maximization (IM) is to discover a seed set containing $k$ vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus obtain inferior results, (ii) the frequently-used sampling methods for generating RR sets have low efficiency because of a large number of required samplings along with high sampling variances, and (iii) the vertex-based IM algorithms have large overheads in terms of running time and memory costs. To overcome these shortcomings, this paper proposes a novel approach, called \emph{HyperIM}. The key idea behind \emph{HyperIM} is to differentiate structural information of vertices for developing stratified sampling combined with highly-efficient strategies to generate the RR sets. With theoretical guarantees, \emph{HyperIM} is able to accelerate the influence spread, improve the sampling efficiency, and cut down the expected running time. To further reduce the running time and memory costs, we optimize \emph{HyperIM} by inferring the bound of the required number of RR sets in conjunction with stratified sampling. Experimental results on real-world hypergraphs show that \emph{HyperIM} is able to reduce the number of required RR sets and running time by orders of magnitude while increasing the influence spread by up to $2.73X$ on average, compared to the state-of-the-art IM algorithms.
