Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

Lingling Zhang; Hong Jiang; Ye Yuan; Guoren Wang

Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

Lingling Zhang, Hong Jiang, Ye Yuan, Guoren Wang

TL;DR

This work addresses influence maximization on hypergraphs, identifying inefficiencies in existing vertex-based and hyperedge-based methods. It introduces HyperIM, a stratified-sampling framework that differentiates vertex roles by hyperedge structure and employs Binomial- and Poisson-based sampling to generate RR sets more efficiently, while preserving approximation guarantees. The authors also propose HyperIM_BRR to tighten bounds and further reduce the number of RR sets. Theoretical analysis shows improved influence spread, sampling efficiency, and reduced runtime, and experiments on real-world hypergraphs demonstrate substantial gains (up to 2.73× in influence and orders-of-magnitude reductions in RR-set counts and running time) over state-of-the-art baselines. These contributions offer a practical, scalable approach for IM in high-order networks with strong theoretical guarantees and broad applicability.

Abstract

Given a hypergraph, influence maximization (IM) is to discover a seed set containing $k$ vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus obtain inferior results, (ii) the frequently-used sampling methods for generating RR sets have low efficiency because of a large number of required samplings along with high sampling variances, and (iii) the vertex-based IM algorithms have large overheads in terms of running time and memory costs. To overcome these shortcomings, this paper proposes a novel approach, called \emph{HyperIM}. The key idea behind \emph{HyperIM} is to differentiate structural information of vertices for developing stratified sampling combined with highly-efficient strategies to generate the RR sets. With theoretical guarantees, \emph{HyperIM} is able to accelerate the influence spread, improve the sampling efficiency, and cut down the expected running time. To further reduce the running time and memory costs, we optimize \emph{HyperIM} by inferring the bound of the required number of RR sets in conjunction with stratified sampling. Experimental results on real-world hypergraphs show that \emph{HyperIM} is able to reduce the number of required RR sets and running time by orders of magnitude while increasing the influence spread by up to $2.73X$ on average, compared to the state-of-the-art IM algorithms.

Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

TL;DR

Abstract

Given a hypergraph, influence maximization (IM) is to discover a seed set containing

vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus obtain inferior results, (ii) the frequently-used sampling methods for generating RR sets have low efficiency because of a large number of required samplings along with high sampling variances, and (iii) the vertex-based IM algorithms have large overheads in terms of running time and memory costs. To overcome these shortcomings, this paper proposes a novel approach, called \emph{HyperIM}. The key idea behind \emph{HyperIM} is to differentiate structural information of vertices for developing stratified sampling combined with highly-efficient strategies to generate the RR sets. With theoretical guarantees, \emph{HyperIM} is able to accelerate the influence spread, improve the sampling efficiency, and cut down the expected running time. To further reduce the running time and memory costs, we optimize \emph{HyperIM} by inferring the bound of the required number of RR sets in conjunction with stratified sampling. Experimental results on real-world hypergraphs show that \emph{HyperIM} is able to reduce the number of required RR sets and running time by orders of magnitude while increasing the influence spread by up to

on average, compared to the state-of-the-art IM algorithms.

Paper Structure (28 sections, 9 theorems, 18 equations, 10 figures, 3 tables, 4 algorithms)

This paper contains 28 sections, 9 theorems, 18 equations, 10 figures, 3 tables, 4 algorithms.

Introduction
Preliminaries
Preliminaries
Problem definition
The existing IM algorithms
Sampling methods for RR set generations
The number of RR sets
HyperIM
Stratified sampling for generating RR sets
Sample set division.
Sampling methods
Sampling strategy
Theoretical analysis for HyperIM
Influence spread
Sampling efficiency and accuracy
...and 13 more sections

Key Result

Lemma 1

Given vertex $\mu$ and its two layers denoted as $L_{i}$ and $L_{j}$ in $A(\mu)$ where $0<i<j\leq l_{\mu}$, we have $P_\mu(L_{i})>P_\mu(L_{j})$ and $\sum_{i=1}^{i=l_\mu}P_\mu(L_{i})\leq 1$.

Figures (10)

Figure 1: Example of a hypergraph, Figure \ref{['fig_example']}(a) and its transformation into a regular graph, Figure \ref{['fig_example']}(b). When analyzing the influence, although the vertices $u_1$ and $u_3$ have the same structure in Hypergraph, as shown in Figure \ref{['fig_example']}(a), they show different structures shown in the corresponding regular graph in Figure \ref{['fig_example']}(b).
Figure 2: Example of an influence propagation process.
Figure 3: An example of sample set divisions according to the vertices' ability to activate $u_1$ in hypergraph $G$.
Figure 4: Activation probabilities of vertices in Figure \ref{['fig_setdivid']}(b) under stratified setting and uniform setting while the stratified setting benefits to obtain influential RR sets.
Figure 5: The number of influential vertices with different sizes of seed sets using different IM algorithms under the IC model.
...and 5 more figures

Theorems & Definitions (10)

Definition 1
Lemma 1
Lemma 2
Lemma 3
Theorem 1
Theorem 2
Theorem 3
Lemma 4
Lemma 5
Lemma 6

Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

TL;DR

Abstract

Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (10)