Table of Contents
Fetching ...

Optimal Dynamic Parameterized Subset Sampling

Junhao Gan, Seeun William Umboh, Hanzhi Wang, Anthony Wirth, Zhuo Zhang

TL;DR

An optimal algorithm for solving the DPSS problem, which achieves O(n) pre-processing time, O(1+μ_S(α,β)) expected time for each query parameterized by (α, β), given on-the-fly, and O(1) time for each update; here, μ_S(α,β) is the expected size of the query result.

Abstract

In this paper, we study the Dynamic Parameterized Subset Sampling (DPSS) problem in the Word RAM model. In DPSS, the input is a set,~$S$, of~$n$ items, where each item,~$x$, has a non-negative integer weight,~$w(x)$. Given a pair of query parameters, $(α, β)$, each of which is a non-negative rational number, a parameterized subset sampling query on~$S$ seeks to return a subset $T \subseteq S$ such that each item $x \in S$ is selected in~$T$, independently, with probability $p_x(α, β) = \min \left\{\frac{w(x)}{α\sum_{x\in S} w(x)+β}, 1 \right\}$. More specifically, the DPSS problem is defined in a dynamic setting, where the item set,~$S$, can be updated with insertions of new items or deletions of existing items. Our first main result is an optimal algorithm for solving the DPSS problem, which achieves~$O(n)$ pre-processing time, $O(1+μ_S(α,β))$ expected time for each query parameterized by $(α, β)$, given on-the-fly, and $O(1)$ time for each update; here, $μ_S(α,β)$ is the expected size of the query result. At all times, the worst-case space consumption of our algorithm is linear in the current number of items in~$S$. Our second main contribution is a hardness result for the DPSS problem when the item weights are~$O(1)$-word float numbers, rather than integers. Specifically, we reduce Integer Sorting to the deletion-only DPSS problem with float item weights. Our reduction implies that an optimal algorithm for deletion-only DPSS with float item weights (achieving all the same bounds as aforementioned) implies an optimal algorithm for Integer Sorting. The latter remains an important open problem. Last but not least, a key technical ingredient for our first main result is an efficient algorithm for generating Truncated Geometric random variates in $O(1)$ expected time in the Word RAM model.

Optimal Dynamic Parameterized Subset Sampling

TL;DR

An optimal algorithm for solving the DPSS problem, which achieves O(n) pre-processing time, O(1+μ_S(α,β)) expected time for each query parameterized by (α, β), given on-the-fly, and O(1) time for each update; here, μ_S(α,β) is the expected size of the query result.

Abstract

In this paper, we study the Dynamic Parameterized Subset Sampling (DPSS) problem in the Word RAM model. In DPSS, the input is a set,~, of~ items, where each item,~, has a non-negative integer weight,~. Given a pair of query parameters, , each of which is a non-negative rational number, a parameterized subset sampling query on~ seeks to return a subset such that each item is selected in~, independently, with probability . More specifically, the DPSS problem is defined in a dynamic setting, where the item set,~, can be updated with insertions of new items or deletions of existing items. Our first main result is an optimal algorithm for solving the DPSS problem, which achieves~ pre-processing time, expected time for each query parameterized by , given on-the-fly, and time for each update; here, is the expected size of the query result. At all times, the worst-case space consumption of our algorithm is linear in the current number of items in~. Our second main contribution is a hardness result for the DPSS problem when the item weights are~-word float numbers, rather than integers. Specifically, we reduce Integer Sorting to the deletion-only DPSS problem with float item weights. Our reduction implies that an optimal algorithm for deletion-only DPSS with float item weights (achieving all the same bounds as aforementioned) implies an optimal algorithm for Integer Sorting. The latter remains an important open problem. Last but not least, a key technical ingredient for our first main result is an efficient algorithm for generating Truncated Geometric random variates in expected time in the Word RAM model.
Paper Structure (18 sections, 25 theorems, 5 equations, 2 figures, 5 algorithms)

This paper contains 18 sections, 25 theorems, 5 equations, 2 figures, 5 algorithms.

Key Result

theorem 1

Consider a set $S$ of $n$ items; there exists a data structure for solving the Dynamic Parameterized Subset Sampling (DPSS) problem, which achieves: At all times, the worst-case space consumption of such a data structure is bounded by $O(n)$, where $n$ denotes the current cardinality of $S$.

Figures (2)

  • Figure 1: Visualization of the three-level sampling hierarchy. Here, circles represent items and ovals represent buckets. The diagram illustrates the structure of a level-1 group $G_X(j)$, where each bucket in $B_X(i)$ corresponds to an item $y_i\in Y_j$. The BG-Str($Y_j$) contains multiple buckets in multiple groups. Each group, e.g. $G_{Y_j}(k)$ corresponds to a next-level item set, e.g. $Z_k$. In level-3, each item set $Z_k$ generates one final-level instance $V_k$, in which each item corresponds to a bucket in level-3.
  • Figure 2: Visualization of a lookup table. It contains $(m+1)^K$ rows, each row corresponds to a configuration, and each row contains $(m^2)^K$ cells. Each cell holds a $K$-bit string, e.g. $r_0,r_1,\ldots$. As shown in the figure, for a given configuration $\vec{c}=\{c_1,c_2,\ldots,c_K\}$, if $\text{Pr}(r_0)=\frac{6}{(m^2)^K}$, then there are 6 cells storing $r_0$ in the corresponding row.

Theorems & Definitions (26)

  • theorem 1
  • theorem 2
  • theorem 3
  • theorem 4
  • definition 1: $i$-Bit Approximation
  • lemma 1
  • lemma 2
  • lemma 3
  • lemma 4
  • lemma 5
  • ...and 16 more