Table of Contents
Fetching ...

DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

Jinchao Huang, Sibo Wang

TL;DR

This work tackles Poisson $πps$ sampling under dynamic updates by introducing DIPS, a dynamic index that achieves expected $O(1)$ query and update times with $O(n)$ space. The method combines three ideas: building blocks for special-weight regimes, a size-reduction scheme based on buckets and chunks to bound weight growth, and a final table-lookups-based exact PPS sampler after reduction. It addresses weight explosion and inter-element correlations that hinder previous SS-based approaches, and provides theoretical guarantees alongside practical efficiency. Empirically, DIPS outperforms dynamic SS baselines in index updates while maintaining competitive query performance, and it yields substantial speedups when integrated into dynamic Influence Maximization on evolving graphs, demonstrating real-world impact in data mining and network analysis.

Abstract

This paper addresses the Poisson $π$ps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set $\mathcal{S}$ of $n$ elements, where each element $v$ is assigned a weight $w(v)$ reflecting its importance. The goal is to generate a random subset $X$ of $\mathcal{S}$, where each element $v \in \mathcal{S}$ is included in $X$ independently with probability $\frac{c\cdot w(v)}{\sum_{v \in \mathcal{S}} w(v)}$, where $0<c\leq 1$ is a constant. The subsets must be independent across different queries. While the Poisson $π$ps sampling problem can be reduced to the well-studied subset sampling problem, updates in Poisson $π$ps sampling, such as adding a new element or removing an element, would cause the probabilities of all $n$ elements to change in the corresponding subset sampling problem, making this approach impractical for dynamic scenarios. To address this, we propose a dynamic index specifically tailored for the Poisson $π$ps sampling problem, supporting optimal expected $\mathcal{O}(1)$ query time and $\mathcal{O}(1)$ index update time, with an optimal $\mathcal{O}(n)$ space cost. Our solution involves recursively partitioning the set by weights and ultimately using table lookup. The core of our solution lies in addressing the challenges posed by weight explosion and correlations between elements. Empirical evaluations demonstrate that our approach achieves significant speedups in update time while maintaining consistently competitive query time compared to the subset-sampling-based methods.

DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

TL;DR

This work tackles Poisson sampling under dynamic updates by introducing DIPS, a dynamic index that achieves expected query and update times with space. The method combines three ideas: building blocks for special-weight regimes, a size-reduction scheme based on buckets and chunks to bound weight growth, and a final table-lookups-based exact PPS sampler after reduction. It addresses weight explosion and inter-element correlations that hinder previous SS-based approaches, and provides theoretical guarantees alongside practical efficiency. Empirically, DIPS outperforms dynamic SS baselines in index updates while maintaining competitive query performance, and it yields substantial speedups when integrated into dynamic Influence Maximization on evolving graphs, demonstrating real-world impact in data mining and network analysis.

Abstract

This paper addresses the Poisson ps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set of elements, where each element is assigned a weight reflecting its importance. The goal is to generate a random subset of , where each element is included in independently with probability , where is a constant. The subsets must be independent across different queries. While the Poisson ps sampling problem can be reduced to the well-studied subset sampling problem, updates in Poisson ps sampling, such as adding a new element or removing an element, would cause the probabilities of all elements to change in the corresponding subset sampling problem, making this approach impractical for dynamic scenarios. To address this, we propose a dynamic index specifically tailored for the Poisson ps sampling problem, supporting optimal expected query time and index update time, with an optimal space cost. Our solution involves recursively partitioning the set by weights and ultimately using table lookup. The core of our solution lies in addressing the challenges posed by weight explosion and correlations between elements. Empirical evaluations demonstrate that our approach achieves significant speedups in update time while maintaining consistently competitive query time compared to the subset-sampling-based methods.
Paper Structure (26 sections, 5 theorems, 7 equations, 9 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 5 theorems, 7 equations, 9 figures, 1 table, 4 algorithms.

Key Result

lemma 1

Given a PPS problem instance $\Phi=\left<\mathcal{S}, w, c\right>$ and a subset $T\subseteq\mathcal{S}$, if all weights in $T$ fall within a bounded ratio range $w(T)\subseteq(\bar{w}/b, \bar{w}]$ where $b$ is a constant and $\bar{w}\in\mathbb{R}_{>0}$, then we can: (i) Initialize a data structure $

Figures (9)

  • Figure 1: Max absolute error v.s. repeat times on different distributions (in seconds). ($n=10^5$)
  • Figure 2: Update time v.s. query time on different distributions (in seconds). ($n=10^5$)
  • Figure 3: Varying $n$: Query time (in seconds) on different distributions. (c=1)
  • Figure 4: Varying $n$: Update time (in seconds) on different distributions.
  • Figure 5: Running time of dynamic IM algorithm based on different Poisson $\pi$ps sampling indexes.
  • ...and 4 more figures

Theorems & Definitions (5)

  • lemma 1: bounded weight ratio
  • lemma 2: subcritical-weight
  • lemma 3: size reduction
  • lemma 4: table lookup
  • theorem 1