Table of Contents
Fetching ...

Sampling Permutations with Cell Probes is Hard

Yaroslav Alekseev, Mika Göös, Konstantin Myasnikov, Artur Riazanov, Dmitry Sokolov

TL;DR

This work establishes fundamental lower bounds for sampling a uniform permutation using cell-probe decision forests, showing that even modest-depth adaptive schemes cannot approximate a uniformly random permutation unless the depth grows polylogarithmically in n. The authors develop a novel entropy-based dichotomy, introduce average Lipschitzness to handle adaptivity, and prove containment and collision lemmas that amplify small-distance gaps to near-total separation. A key outcome is an exponential separation between adaptive and nonadaptive sampling, with strong consequences for succinct data structures storing permutations. Together, these results illuminate the inherent difficulty of permutation sampling in the cell-probe model and connect to broader data-structure lower bounds and symmetric-sampling questions.

Abstract

Suppose we are given an infinite sequence of input cells, each initialized with a uniform random symbol from $[n]$. How hard is it to output a sequence in $[n]^n$ that is close to a uniform random permutation? Viola (SICOMP 2020) conjectured that if each output cell is computed by making $d$ probes to input cells, then $d\geqω(1)$. Our main result shows that, in fact, $d\geq (\log n)^{Ω(1)}$, which is tight up to the constant in the exponent. Our techniques also show that if the probes are nonadaptive, then $d\geq n^{Ω(1)}$, which is an exponential improvement over the previous nonadaptive lower bound due to Yu and Zhan (ITCS 2024). Our results also imply lower bounds against succinct data structures for storing permutations.

Sampling Permutations with Cell Probes is Hard

TL;DR

This work establishes fundamental lower bounds for sampling a uniform permutation using cell-probe decision forests, showing that even modest-depth adaptive schemes cannot approximate a uniformly random permutation unless the depth grows polylogarithmically in n. The authors develop a novel entropy-based dichotomy, introduce average Lipschitzness to handle adaptivity, and prove containment and collision lemmas that amplify small-distance gaps to near-total separation. A key outcome is an exponential separation between adaptive and nonadaptive sampling, with strong consequences for succinct data structures storing permutations. Together, these results illuminate the inherent difficulty of permutation sampling in the cell-probe model and connect to broader data-structure lower bounds and symmetric-sampling questions.

Abstract

Suppose we are given an infinite sequence of input cells, each initialized with a uniform random symbol from . How hard is it to output a sequence in that is close to a uniform random permutation? Viola (SICOMP 2020) conjectured that if each output cell is computed by making probes to input cells, then . Our main result shows that, in fact, , which is tight up to the constant in the exponent. Our techniques also show that if the probes are nonadaptive, then , which is an exponential improvement over the previous nonadaptive lower bound due to Yu and Zhan (ITCS 2024). Our results also imply lower bounds against succinct data structures for storing permutations.

Paper Structure

This paper contains 52 sections, 36 theorems, 78 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1

Suppose that $f\colon [n]^{\mathbb{N}} \to [n]^n$ is a decision forest of depth $(\log n)^{1/2-\varepsilon}$ for some constant $\varepsilon>0$. Then for $\bm u\sim[n]^{\mathbb{N}}$ and $\bm \pi \sim S_n$ we have

Figures (4)

  • Figure 1: (Left): Three iterations of the Thorp shuffle Thorp73 as computed by a network of switches. In a single iteration, we take cards $i$ and $n/2 + i$ and place them in positions $2i - 1$ and $2i$ in the order determined by a coin toss $r_j\in\{0,1\}$. (Right): The shuffle can be simulated by a cell-probe algorithm (aka decision forest) that probes the coin tosses $r_j$Viola12. Drawn here is the decision tree that finds the $5$th output element.
  • Figure 2: The picture illustrates the process of boosting the distance from $1/2$ to almost $1$ by expanding the witnessing event to its neighborhood.
  • Figure 3: The picture illustrates the choice of $\bm J$. We first sample the set $\bm I \subseteq [\ell]$ of input cells and take to $\bm J$ only the trees that query a symbol in $\bm I$ as their first query. The undesirable events for us are non-first queries from $\bm J$ to $\bm I$, it is represented as the red line in the picture. The main point is that for a fixed $i \in [\ell]$ and $j \in [m]$ this happens with probability $\alpha^2$, but the expected size of $\bm J$ is an $\alpha$-fraction of $[m]$. Hence, the subsampling procedure sparsifies the undesirable events.
  • Figure 4: This picture illustrates the approach to prove that the probability of having collision value$k$ is significant, i.e. $p \coloneqq \Pr[\exists i\neq j \in [m]\colon \bm z_i = \bm z_j = k]$ is bounded away from zero. The key technical step in the proof is to remove "heavy" edges from the graph---the ones with $p_k^i > n^{-\delta/2}$. $G_k$ denotes the neighborhood of $k$ in $H$ with all heavy edges removed.

Theorems & Definitions (67)

  • Theorem 1: Main result
  • Theorem 2
  • Corollary 3
  • Lemma 4
  • proof
  • Lemma 4
  • Definition 1: BIL12
  • Lemma 4
  • Theorem 5
  • proof
  • ...and 57 more