Table of Contents
Fetching ...

Unveiling Environmental Sensitivity of Individual Gains in Influence Maximization

Xinyan Su, Zhiheng Zhang, Jiyan Qiu

TL;DR

This work reframes Influence Maximization as a causal decision problem by introducing CauIM, which assigns node weights via Individual Treatment Effects that capture environment-sensitive gains during diffusion on hypergraphs. It develops two algorithms, G-CauIM (greedy, offline ITE estimation) and A-CauIM (gradient-descent acceleration with differentiable proxies), and provides theoretical guarantees including a $(1-\frac{1}{e})$-approximation under nonnegative ITE and robustness to estimation bias. Empirically, CauIM demonstrates strong performance and robustness across real and synthetic datasets, with A-CauIM delivering substantial efficiency gains while maintaining competitive accuracy. The framework enables practical, causality-aware seed selection in networks where node influence is dynamic and context-dependent, offering scalable, robust solutions for disseminating information and promotions.

Abstract

Influence Maximization (IM) is to identify the seed set to maximize information dissemination in a network. Elegant IM algorithms could naturally extend to cases where each node is equipped with a specific weight, reflecting individual gains to measure the node's importance. Prevailing literature typically assumes such individual gains remain constant throughout the cascade process and are solvable through explicit formulas based on the node's characteristics and network topology. However, this assumption is not always feasible for two reasons: 1)Unobservability: The individual gains of each node are primarily evaluated by the difference between the outputs in the activated and non-activated states. In practice, we can only observe one of these states, with the other remaining unobservable post-propagation. 2)Environmental sensitivity: In addition to the node's inherent properties, individual gains are also sensitive to the activation status of surrounding nodes, which is dynamic during iteration even when the network topology remains static. To address these challenges, we extend the consideration of IM to a broader scenario with dynamic node individual gains, leveraging causality techniques. In our paper, we introduce a Causal Influence Maximization (CauIM) framework and develop two algorithms, G-CauIM and A-CauIM, where the latter incorporates a novel acceleration technique. Theoretically, we establish the generalized lower bound of influence spread and provide robustness analysis. Empirically, in synthetic and real-world experiments, we demonstrate the effectiveness and robustness of our algorithms.

Unveiling Environmental Sensitivity of Individual Gains in Influence Maximization

TL;DR

This work reframes Influence Maximization as a causal decision problem by introducing CauIM, which assigns node weights via Individual Treatment Effects that capture environment-sensitive gains during diffusion on hypergraphs. It develops two algorithms, G-CauIM (greedy, offline ITE estimation) and A-CauIM (gradient-descent acceleration with differentiable proxies), and provides theoretical guarantees including a -approximation under nonnegative ITE and robustness to estimation bias. Empirically, CauIM demonstrates strong performance and robustness across real and synthetic datasets, with A-CauIM delivering substantial efficiency gains while maintaining competitive accuracy. The framework enables practical, causality-aware seed selection in networks where node influence is dynamic and context-dependent, offering scalable, robust solutions for disseminating information and promotions.

Abstract

Influence Maximization (IM) is to identify the seed set to maximize information dissemination in a network. Elegant IM algorithms could naturally extend to cases where each node is equipped with a specific weight, reflecting individual gains to measure the node's importance. Prevailing literature typically assumes such individual gains remain constant throughout the cascade process and are solvable through explicit formulas based on the node's characteristics and network topology. However, this assumption is not always feasible for two reasons: 1)Unobservability: The individual gains of each node are primarily evaluated by the difference between the outputs in the activated and non-activated states. In practice, we can only observe one of these states, with the other remaining unobservable post-propagation. 2)Environmental sensitivity: In addition to the node's inherent properties, individual gains are also sensitive to the activation status of surrounding nodes, which is dynamic during iteration even when the network topology remains static. To address these challenges, we extend the consideration of IM to a broader scenario with dynamic node individual gains, leveraging causality techniques. In our paper, we introduce a Causal Influence Maximization (CauIM) framework and develop two algorithms, G-CauIM and A-CauIM, where the latter incorporates a novel acceleration technique. Theoretically, we establish the generalized lower bound of influence spread and provide robustness analysis. Empirically, in synthetic and real-world experiments, we demonstrate the effectiveness and robustness of our algorithms.
Paper Structure (35 sections, 5 theorems, 19 equations, 6 figures, 4 tables, 4 algorithms)

This paper contains 35 sections, 5 theorems, 19 equations, 6 figures, 4 tables, 4 algorithms.

Key Result

Proposition 4.1

Our CauIM problem model is NP-hard.

Figures (6)

  • Figure 1: Illustration of individual gains during a certain propagation iteration in product promotion scenario, where we focus on the analysis of the starred (*) node. The current iteration is depicted in the leftmost figure, illustrating nodes in varying states within the network; blue nodes indicate activation for purchases, while grey nodes remain inactive, and $?$ denotes unkonwn status. The actual individual gains is the difference between the profit of the node in the activated state and that in the non-activated state. The first challenge is located on the unobservability, i,e, we can only observe one status of these nodes, with the counterfactual scenario unknown. More seriously, the second challenge is environmental sensitivity, which indicates the individual gains of the node are affected by the activation status of others.
  • Figure 2: A-CauIM. Compared with G-CauIM (Algorithm \ref{['alg1: traditional_cauim']}), we add a storage table for activation probabilities $ap(;)$ and then simplify the complex greedy selection (Equation \ref{['greedy_alg']}) into more efficient derivative operations (Equation \ref{['dev']}). In addition, we we transform $ap(;)$ into continuous values closing to $0,1$ to signify the activated states $T_i$ of each node on average. And by this procedure, we obtain $\widehat{\mathbb{E}{{\tau}_{i}}}$ which is the approximation of the expectation on unobserved ${\tau}_i$.
  • Figure 3: a & b) Performance of CauIM on the GoodReads and Contact dataset. "Iter" refers to the time step in each seed selection round here. c) Variance curve trend with different noise in individual ITE. $\epsilon_{y_i}$ represents the standard variance of the noise included. d) Final sum of ITE under various $p$.
  • Figure 4: The procedure of G-CauIM. $T_i$ indicates whether the node is activated or not (we can only observe one situation for each node ${v_i}$) and $T_{-i}$ represents activated status of its surroundings, as illustrated in Equation \ref{['causal_effect']}. For each round, we construct the ITE estimation $\hat{\tau}_{i}:= \widehat{ITE}(;\theta^{opt})$ mentioned in Section \ref{['methodology']} and then treat it as the node weight. Furthermore, we conduct a weighted greedy algorithm with SICP propagation mechanism (Figure \ref{['pic1']}). As a result, we expand the seed set ($v_4$ is added), targeting the (estimated) largest sum of ITE. The main challenge is that $\hat{\tau}_{i}$ are not constants, since the omitted parameter (Equation \ref{['causal_effect']}) changes according to the different activation status of nodes in each iteration.
  • Figure 5: a) Distribution of individual ITE in GoodReads Dataset. b) Distribution of individual ITE in Contact Dataset.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Proposition 4.1
  • Lemma 4.2: Approximal optimal guarantee of greedy IM on hypergraph
  • proof : Proof of sketch
  • Theorem 4.4: Approximate optimal guarantee of CauIM
  • proof : Proof of sketch
  • Theorem 4.5: Robustness
  • Corollary 4.6: Robustness of noise
  • proof : Proof of sketch
  • proof
  • proof
  • ...and 1 more