Table of Contents
Fetching ...

Nearly Optimal Bounds for Stochastic Online Sorting

Yang Hu

TL;DR

This work resolves the stochastic online sorting problem by showing that the expected cost can be driven down to near-logarithmic scale. The authors introduce two core techniques—adaptive allocation and segment synchronization with dampening buffers—that are integrated into a recursive final algorithm. They prove a nearly tight upper bound of $E[\text{cost}] = \log n\cdot 2^{O(\log^* n)}$ and a matching lower bound of $Ω(\log n)$, with an additional high-probability polylog bound in a non-recursive variant. This establishes near-optimal performance for stochastic online sorting and offers techniques potentially applicable to hashing and other online assignment problems.

Abstract

In the online sorting problem, we have an array $A$ of $n$ cells, and receive a stream of $n$ items $x_1,\dots,x_n\in [0,1]$. When an item arrives, we need to immediately and irrevocably place it into an empty cell. The goal is to minimize the sum of absolute differences between adjacent items, which is called the \emph{cost} of the algorithm. It has been shown by Aamand, Abrahamsen, Beretta, and Kleist (SODA 2023) that when the stream $x_1,\dots,x_n$ is generated adversarially, the optimal cost bound for any deterministic algorithm is $Θ(\sqrt{n})$. In this paper, we study the stochastic version of online sorting, where the input items $x_1,\dots,x_n$ are sampled uniformly at random. Despite the intuition that the stochastic version should yield much better cost bounds, the previous best algorithm for stochastic online sorting by Abrahamsen, Bercea, Beretta, Klausen and Kozma (ESA 2024) only achieves $\tilde{O}(n^{1/4})$ cost, which seems far from optimal. We show that stochastic online sorting indeed allows for much more efficient algorithms, by presenting an algorithm that achieves expected cost $\log n\cdot 2^{O(\log^* n)}$. We also prove a cost lower bound of $Ω(\log n)$, thus show that our algorithm is nearly optimal.

Nearly Optimal Bounds for Stochastic Online Sorting

TL;DR

This work resolves the stochastic online sorting problem by showing that the expected cost can be driven down to near-logarithmic scale. The authors introduce two core techniques—adaptive allocation and segment synchronization with dampening buffers—that are integrated into a recursive final algorithm. They prove a nearly tight upper bound of and a matching lower bound of , with an additional high-probability polylog bound in a non-recursive variant. This establishes near-optimal performance for stochastic online sorting and offers techniques potentially applicable to hashing and other online assignment problems.

Abstract

In the online sorting problem, we have an array of cells, and receive a stream of items . When an item arrives, we need to immediately and irrevocably place it into an empty cell. The goal is to minimize the sum of absolute differences between adjacent items, which is called the \emph{cost} of the algorithm. It has been shown by Aamand, Abrahamsen, Beretta, and Kleist (SODA 2023) that when the stream is generated adversarially, the optimal cost bound for any deterministic algorithm is . In this paper, we study the stochastic version of online sorting, where the input items are sampled uniformly at random. Despite the intuition that the stochastic version should yield much better cost bounds, the previous best algorithm for stochastic online sorting by Abrahamsen, Bercea, Beretta, Klausen and Kozma (ESA 2024) only achieves cost, which seems far from optimal. We show that stochastic online sorting indeed allows for much more efficient algorithms, by presenting an algorithm that achieves expected cost . We also prove a cost lower bound of , thus show that our algorithm is nearly optimal.

Paper Structure

This paper contains 61 sections, 22 theorems, 21 equations, 3 figures.

Key Result

Theorem 1.1

There exists a deterministic algorithm for stochastic online sorting, that achieves expected cost $\log n\cdot 2^{O(\log^* n)}$.

Figures (3)

  • Figure 1: An illustration of adaptive allocation. The intervals indicate the buffers, and the shaded regions represent the cells that are filled with items. (i) At the start of phase $1$, we allocate one buffer for each segment. (ii) During phase $1$, items are inserted into the buffers, until one buffer is full. The buffers are not necessarily filled from left to right. (iii) At this point, we allocate new buffers within the pool $P$, where the sizes of the new buffers are given by the Chernoff bound. Note that the new buffers have different sizes, because the segments didn't receive the same number of items in phase $1$. (iv) After this adaptive allocation, as far as the new pool is concerned, the two buffers of each segment behave as one single buffer of size $m'$. This is to say that, replacing these two buffers with one big buffer does not change the set of items inserted to the new pool (however the items inserted to these two buffers are handled differently as those inserted to one big buffer).
  • Figure 2: An illustration of how dampening buffers work. In this example, $K=4$, and we only illustrate the behavior of the first mega-segment. Similar to \ref{['fig:re_allocate']}, the shaded regions represent the filled cells. The arrows from one buffer to another indicates that, since the previous buffer is full, new items that should enter it are instead forwarded to the other buffer. (i) Before merging, we have a set of buffers, where the buffers of each segment has $m'$ empty cells in total. For each mega-segment, we allocate one dampening buffer of size $m"$. $P$ is the post-dampening pool. (ii,iii) New items first enter the older buffers. Only when the buffers of some sub-segment are all full will new items from that sub-segment start entering the dampening buffer. (iv) Finally, new items from the mega-segment only start entering the post-dampening pool when the dampening buffer is full. Ideally, when the dampening buffer is full, the older buffers are also full.
  • Figure 3: An illustration of the merge subroutine with $B_{j-1}=4$ and $K=2$. (i) Right after the re-allocation subroutine, each phase-$(j-1)$ segment has remaining capacity $m_j^{\text{old}}$. (ii) In the merge subroutine, a dampening buffer is allocated for each phase-$j$ segment. (iii) Logically, the previous buffers of a phase-$j$ segment and the dampening buffer of that segment are functionally equivalent to one single buffer.

Theorems & Definitions (44)

  • Theorem 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Claim 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Theorem 4.4
  • ...and 34 more