Table of Contents
Fetching ...

Encoding Schemes for Parallel In-Place Algorithms

Chase Hutton, Adam Melrod

TL;DR

This work advances in-place parallel computation by introducing encoding-based techniques—notably inversion encoding and restorable buffers—that unlock additional information within the input to support conflict-free, low-memory parallelism. It delivers work-efficient strong PIP algorithms for two canonical problems: merging two sorted sequences and random permutation, each with $O(n)$ or $O(n)$-scaled work and polylogarithmic depth (e.g., $O( ext{log}^2 n)$ span for merging, $O( ext{polylog } n)$ for permutation), and demonstrates a practical application to integer sorting. The methods rely on a general encoding framework and careful two-phase algorithms that leverage input structure, with theoretical guarantees complemented by an empirical evaluation showing negligible memory overhead and competitive runtimes against optimized non-in-place solutions. Overall, the paper broadens the toolkit for space-efficient parallel algorithm design and provides a path toward broadly applicable in-place techniques beyond the specific problems analyzed.

Abstract

Many parallel algorithms which solve basic problems in computer science use auxiliary space linear in the input to facilitate conflict-free computation. There has been significant work on improving these parallel algorithms to be in-place, that is to use as little auxiliary memory as possible. In this paper, we provide novel in-place algorithms to solve the fundamental problems of merging two sorted sequences, and randomly shuffling a sequence. Both algorithms are work-efficient and have polylogarithmic span. Our algorithms employ encoding techniques which exploit the underlying structure of the input to gain access to more bits, which enables the use of auxiliary data as well as non-in-place methods. The encoding techniques we develop are general. We expect them to be useful in developing in-place algorithms for other problems beyond those already mentioned. To demonstrate this, we outline an additional application to integer sorting. In addition to our theoretical contributions, we implement our merging algorithm, and measure its memory usage and runtime.

Encoding Schemes for Parallel In-Place Algorithms

TL;DR

This work advances in-place parallel computation by introducing encoding-based techniques—notably inversion encoding and restorable buffers—that unlock additional information within the input to support conflict-free, low-memory parallelism. It delivers work-efficient strong PIP algorithms for two canonical problems: merging two sorted sequences and random permutation, each with or -scaled work and polylogarithmic depth (e.g., span for merging, for permutation), and demonstrates a practical application to integer sorting. The methods rely on a general encoding framework and careful two-phase algorithms that leverage input structure, with theoretical guarantees complemented by an empirical evaluation showing negligible memory overhead and competitive runtimes against optimized non-in-place solutions. Overall, the paper broadens the toolkit for space-efficient parallel algorithm design and provides a path toward broadly applicable in-place techniques beyond the specific problems analyzed.

Abstract

Many parallel algorithms which solve basic problems in computer science use auxiliary space linear in the input to facilitate conflict-free computation. There has been significant work on improving these parallel algorithms to be in-place, that is to use as little auxiliary memory as possible. In this paper, we provide novel in-place algorithms to solve the fundamental problems of merging two sorted sequences, and randomly shuffling a sequence. Both algorithms are work-efficient and have polylogarithmic span. Our algorithms employ encoding techniques which exploit the underlying structure of the input to gain access to more bits, which enables the use of auxiliary data as well as non-in-place methods. The encoding techniques we develop are general. We expect them to be useful in developing in-place algorithms for other problems beyond those already mentioned. To demonstrate this, we outline an additional application to integer sorting. In addition to our theoretical contributions, we implement our merging algorithm, and measure its memory usage and runtime.

Paper Structure

This paper contains 45 sections, 10 theorems, 25 equations, 4 figures, 1 table.

Key Result

Theorem 4.1

Merging two sorted arrays of size $n$ and $m$ (where $n \geq m$) can be done with $O(n)$ work, $O(\log^2 n)$ span whp, and using $O(\log n)$ sequential stack-allocated memory.

Figures (4)

  • Figure 1: Running times for our parallel in-place merging (IP Merging) implementation compared to non-in-place implementation from PBBS 10.1145/2312005.2312018. Here the IP merging implementation uses blocks of size 4000 and the input is $32$-bit integers. The running times are obtained on a 96-core machine with two-way hyper-threading, and more details, including memory profiling, are presented in Section \ref{['exp section']}.
  • Figure 2: An example for encoding $6$ in $B = [0,1,2,3,4,5,6,7]$.
  • Figure 3: Relative performance (IP time / PBBS time) vs. input size. PBBS is baseline at y = 1.
  • Figure 4: Running times for PBBS Merging and IP Merging (500M elements) across varying thread counts.

Theorems & Definitions (17)

  • Definition 3.1
  • Definition 3.2
  • Theorem 4.1
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • Lemma 4.10
  • Theorem 5.1
  • Lemma 5.2
  • proof
  • ...and 7 more