Table of Contents
Fetching ...

Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

Lingling Zhang, Hong Jiang, Ye Yuan, Guoren Wang

TL;DR

This work addresses influence maximization on hypergraphs, identifying inefficiencies in existing vertex-based and hyperedge-based methods. It introduces HyperIM, a stratified-sampling framework that differentiates vertex roles by hyperedge structure and employs Binomial- and Poisson-based sampling to generate RR sets more efficiently, while preserving approximation guarantees. The authors also propose HyperIM_BRR to tighten bounds and further reduce the number of RR sets. Theoretical analysis shows improved influence spread, sampling efficiency, and reduced runtime, and experiments on real-world hypergraphs demonstrate substantial gains (up to 2.73× in influence and orders-of-magnitude reductions in RR-set counts and running time) over state-of-the-art baselines. These contributions offer a practical, scalable approach for IM in high-order networks with strong theoretical guarantees and broad applicability.

Abstract

Given a hypergraph, influence maximization (IM) is to discover a seed set containing $k$ vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus obtain inferior results, (ii) the frequently-used sampling methods for generating RR sets have low efficiency because of a large number of required samplings along with high sampling variances, and (iii) the vertex-based IM algorithms have large overheads in terms of running time and memory costs. To overcome these shortcomings, this paper proposes a novel approach, called \emph{HyperIM}. The key idea behind \emph{HyperIM} is to differentiate structural information of vertices for developing stratified sampling combined with highly-efficient strategies to generate the RR sets. With theoretical guarantees, \emph{HyperIM} is able to accelerate the influence spread, improve the sampling efficiency, and cut down the expected running time. To further reduce the running time and memory costs, we optimize \emph{HyperIM} by inferring the bound of the required number of RR sets in conjunction with stratified sampling. Experimental results on real-world hypergraphs show that \emph{HyperIM} is able to reduce the number of required RR sets and running time by orders of magnitude while increasing the influence spread by up to $2.73X$ on average, compared to the state-of-the-art IM algorithms.

Influence Maximization in Hypergraphs by Stratified Sampling for Efficient Generation of Reverse Reachable Sets

TL;DR

This work addresses influence maximization on hypergraphs, identifying inefficiencies in existing vertex-based and hyperedge-based methods. It introduces HyperIM, a stratified-sampling framework that differentiates vertex roles by hyperedge structure and employs Binomial- and Poisson-based sampling to generate RR sets more efficiently, while preserving approximation guarantees. The authors also propose HyperIM_BRR to tighten bounds and further reduce the number of RR sets. Theoretical analysis shows improved influence spread, sampling efficiency, and reduced runtime, and experiments on real-world hypergraphs demonstrate substantial gains (up to 2.73× in influence and orders-of-magnitude reductions in RR-set counts and running time) over state-of-the-art baselines. These contributions offer a practical, scalable approach for IM in high-order networks with strong theoretical guarantees and broad applicability.

Abstract

Given a hypergraph, influence maximization (IM) is to discover a seed set containing vertices that have the maximal influence. Although the existing vertex-based IM algorithms perform better than the hyperedge-based algorithms by generating random reverse researchable (RR) sets, they are inefficient because (i) they ignore important structural information associated with hyperedges and thus obtain inferior results, (ii) the frequently-used sampling methods for generating RR sets have low efficiency because of a large number of required samplings along with high sampling variances, and (iii) the vertex-based IM algorithms have large overheads in terms of running time and memory costs. To overcome these shortcomings, this paper proposes a novel approach, called \emph{HyperIM}. The key idea behind \emph{HyperIM} is to differentiate structural information of vertices for developing stratified sampling combined with highly-efficient strategies to generate the RR sets. With theoretical guarantees, \emph{HyperIM} is able to accelerate the influence spread, improve the sampling efficiency, and cut down the expected running time. To further reduce the running time and memory costs, we optimize \emph{HyperIM} by inferring the bound of the required number of RR sets in conjunction with stratified sampling. Experimental results on real-world hypergraphs show that \emph{HyperIM} is able to reduce the number of required RR sets and running time by orders of magnitude while increasing the influence spread by up to on average, compared to the state-of-the-art IM algorithms.
Paper Structure (28 sections, 9 theorems, 18 equations, 10 figures, 3 tables, 4 algorithms)

This paper contains 28 sections, 9 theorems, 18 equations, 10 figures, 3 tables, 4 algorithms.

Key Result

Lemma 1

Given vertex $\mu$ and its two layers denoted as $L_{i}$ and $L_{j}$ in $A(\mu)$ where $0<i<j\leq l_{\mu}$, we have $P_\mu(L_{i})>P_\mu(L_{j})$ and $\sum_{i=1}^{i=l_\mu}P_\mu(L_{i})\leq 1$.

Figures (10)

  • Figure 1: Example of a hypergraph, Figure \ref{['fig_example']}(a) and its transformation into a regular graph, Figure \ref{['fig_example']}(b). When analyzing the influence, although the vertices $u_1$ and $u_3$ have the same structure in Hypergraph, as shown in Figure \ref{['fig_example']}(a), they show different structures shown in the corresponding regular graph in Figure \ref{['fig_example']}(b).
  • Figure 2: Example of an influence propagation process.
  • Figure 3: An example of sample set divisions according to the vertices' ability to activate $u_1$ in hypergraph $G$.
  • Figure 4: Activation probabilities of vertices in Figure \ref{['fig_setdivid']}(b) under stratified setting and uniform setting while the stratified setting benefits to obtain influential RR sets.
  • Figure 5: The number of influential vertices with different sizes of seed sets using different IM algorithms under the IC model.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Definition 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Lemma 6