DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling
Jinchao Huang, Sibo Wang
TL;DR
This work tackles Poisson $πps$ sampling under dynamic updates by introducing DIPS, a dynamic index that achieves expected $O(1)$ query and update times with $O(n)$ space. The method combines three ideas: building blocks for special-weight regimes, a size-reduction scheme based on buckets and chunks to bound weight growth, and a final table-lookups-based exact PPS sampler after reduction. It addresses weight explosion and inter-element correlations that hinder previous SS-based approaches, and provides theoretical guarantees alongside practical efficiency. Empirically, DIPS outperforms dynamic SS baselines in index updates while maintaining competitive query performance, and it yields substantial speedups when integrated into dynamic Influence Maximization on evolving graphs, demonstrating real-world impact in data mining and network analysis.
Abstract
This paper addresses the Poisson $π$ps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set $\mathcal{S}$ of $n$ elements, where each element $v$ is assigned a weight $w(v)$ reflecting its importance. The goal is to generate a random subset $X$ of $\mathcal{S}$, where each element $v \in \mathcal{S}$ is included in $X$ independently with probability $\frac{c\cdot w(v)}{\sum_{v \in \mathcal{S}} w(v)}$, where $0<c\leq 1$ is a constant. The subsets must be independent across different queries. While the Poisson $π$ps sampling problem can be reduced to the well-studied subset sampling problem, updates in Poisson $π$ps sampling, such as adding a new element or removing an element, would cause the probabilities of all $n$ elements to change in the corresponding subset sampling problem, making this approach impractical for dynamic scenarios. To address this, we propose a dynamic index specifically tailored for the Poisson $π$ps sampling problem, supporting optimal expected $\mathcal{O}(1)$ query time and $\mathcal{O}(1)$ index update time, with an optimal $\mathcal{O}(n)$ space cost. Our solution involves recursively partitioning the set by weights and ultimately using table lookup. The core of our solution lies in addressing the challenges posed by weight explosion and correlations between elements. Empirical evaluations demonstrate that our approach achieves significant speedups in update time while maintaining consistently competitive query time compared to the subset-sampling-based methods.
