DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

Jinchao Huang; Sibo Wang

DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

Jinchao Huang, Sibo Wang

TL;DR

This work tackles Poisson $πps$ sampling under dynamic updates by introducing DIPS, a dynamic index that achieves expected $O(1)$ query and update times with $O(n)$ space. The method combines three ideas: building blocks for special-weight regimes, a size-reduction scheme based on buckets and chunks to bound weight growth, and a final table-lookups-based exact PPS sampler after reduction. It addresses weight explosion and inter-element correlations that hinder previous SS-based approaches, and provides theoretical guarantees alongside practical efficiency. Empirically, DIPS outperforms dynamic SS baselines in index updates while maintaining competitive query performance, and it yields substantial speedups when integrated into dynamic Influence Maximization on evolving graphs, demonstrating real-world impact in data mining and network analysis.

Abstract

This paper addresses the Poisson $π$ps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set $\mathcal{S}$ of $n$ elements, where each element $v$ is assigned a weight $w(v)$ reflecting its importance. The goal is to generate a random subset $X$ of $\mathcal{S}$, where each element $v \in \mathcal{S}$ is included in $X$ independently with probability $\frac{c\cdot w(v)}{\sum_{v \in \mathcal{S}} w(v)}$, where $0<c\leq 1$ is a constant. The subsets must be independent across different queries. While the Poisson $π$ps sampling problem can be reduced to the well-studied subset sampling problem, updates in Poisson $π$ps sampling, such as adding a new element or removing an element, would cause the probabilities of all $n$ elements to change in the corresponding subset sampling problem, making this approach impractical for dynamic scenarios. To address this, we propose a dynamic index specifically tailored for the Poisson $π$ps sampling problem, supporting optimal expected $\mathcal{O}(1)$ query time and $\mathcal{O}(1)$ index update time, with an optimal $\mathcal{O}(n)$ space cost. Our solution involves recursively partitioning the set by weights and ultimately using table lookup. The core of our solution lies in addressing the challenges posed by weight explosion and correlations between elements. Empirical evaluations demonstrate that our approach achieves significant speedups in update time while maintaining consistently competitive query time compared to the subset-sampling-based methods.

DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

TL;DR

This work tackles Poisson

sampling under dynamic updates by introducing DIPS, a dynamic index that achieves expected

query and update times with

space. The method combines three ideas: building blocks for special-weight regimes, a size-reduction scheme based on buckets and chunks to bound weight growth, and a final table-lookups-based exact PPS sampler after reduction. It addresses weight explosion and inter-element correlations that hinder previous SS-based approaches, and provides theoretical guarantees alongside practical efficiency. Empirically, DIPS outperforms dynamic SS baselines in index updates while maintaining competitive query performance, and it yields substantial speedups when integrated into dynamic Influence Maximization on evolving graphs, demonstrating real-world impact in data mining and network analysis.

Abstract

This paper addresses the Poisson

ps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set

elements, where each element

is assigned a weight

reflecting its importance. The goal is to generate a random subset

, where each element

is included in

independently with probability

, where

is a constant. The subsets must be independent across different queries. While the Poisson

ps sampling problem can be reduced to the well-studied subset sampling problem, updates in Poisson

ps sampling, such as adding a new element or removing an element, would cause the probabilities of all

elements to change in the corresponding subset sampling problem, making this approach impractical for dynamic scenarios. To address this, we propose a dynamic index specifically tailored for the Poisson

ps sampling problem, supporting optimal expected

query time and

index update time, with an optimal

space cost. Our solution involves recursively partitioning the set by weights and ultimately using table lookup. The core of our solution lies in addressing the challenges posed by weight explosion and correlations between elements. Empirical evaluations demonstrate that our approach achieves significant speedups in update time while maintaining consistently competitive query time compared to the subset-sampling-based methods.

Paper Structure (26 sections, 5 theorems, 7 equations, 9 figures, 1 table, 4 algorithms)

This paper contains 26 sections, 5 theorems, 7 equations, 9 figures, 1 table, 4 algorithms.

Introduction
Preliminaries
Problem Formalization
Related Work
Relation Between PPS and SS
SOTA Solution for SS
Difficulty in Migrating ODSS to PPS
A Dynamic Index for PPS
Building Blocks
Size Reduction
Table Lookup and Final Result
Experiments
Experimental Settings
Correctness of Queries
Query and Update Efficiency Trade-off
...and 11 more sections

Key Result

lemma 1

Given a PPS problem instance $\Phi=\left<\mathcal{S}, w, c\right>$ and a subset $T\subseteq\mathcal{S}$, if all weights in $T$ fall within a bounded ratio range $w(T)\subseteq(\bar{w}/b, \bar{w}]$ where $b$ is a constant and $\bar{w}\in\mathbb{R}_{>0}$, then we can: (i) Initialize a data structure $

Figures (9)

Figure 1: Max absolute error v.s. repeat times on different distributions (in seconds). ($n=10^5$)
Figure 2: Update time v.s. query time on different distributions (in seconds). ($n=10^5$)
Figure 3: Varying $n$: Query time (in seconds) on different distributions. (c=1)
Figure 4: Varying $n$: Update time (in seconds) on different distributions.
Figure 5: Running time of dynamic IM algorithm based on different Poisson $\pi$ps sampling indexes.
...and 4 more figures

Theorems & Definitions (5)

lemma 1: bounded weight ratio
lemma 2: subcritical-weight
lemma 3: size reduction
lemma 4: table lookup
theorem 1

DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

TL;DR

Abstract

DIPS: Optimal Dynamic Index for Poisson $\boldsymbolπ$ps Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (5)