Table of Contents
Fetching ...

Parallel Integer Sort: Theory and Practice

Xiaojun Dong, Laxman Dhulipala, Yan Gu, Yihan Sun

TL;DR

The paper tackles parallel integer sorting by bridging theory and practice. It introduces DovetailSort (DTSort), a stable, MSD-based algorithm that detects and leverages duplicate keys via sampling and heavy-light partitioning, then interleaves heavy and light buckets with a dovetail merge. The authors prove that a broad class of practical MSD IS algorithms achieves $O(n\sqrt{\log r})$ work and polylogarithmic-to-tilde span, with DTSort attaining $O(n\sqrt{\log r})$ work and $\tilde{O}(2^{\sqrt{\log r}})$ span, and even $O(n)$ work for certain distributions. Empirically, DTSort matches or outperforms state-of-the-art parallel IS and comparison sorts on synthetic and real-world data, especially with many duplicates, while maintaining scalability up to hundreds of cores. Overall, the work offers both a solid theoretical foundation and a practical, robust sorter for large-scale integer data.

Abstract

Integer sorting is a fundamental problem in computer science. This paper studies parallel integer sort both in theory and in practice. In theory, we show tighter bounds for a class of existing practical integer sort algorithms, which provides a solid theoretical foundation for their widespread usage in practice and strong performance. In practice, we design a new integer sorting algorithm, \textsf{DovetailSort}, that is theoretically-efficient and has good practical performance. In particular, \textsf{DovetailSort} overcomes a common challenge in existing parallel integer sorting algorithms, which is the difficulty of detecting and taking advantage of duplicate keys. The key insight in \textsf{DovetailSort} is to combine algorithmic ideas from both integer- and comparison-sorting algorithms. In our experiments, \textsf{DovetailSort} achieves competitive or better performance than existing state-of-the-art parallel integer and comparison sorting algorithms on various synthetic and real-world datasets.

Parallel Integer Sort: Theory and Practice

TL;DR

The paper tackles parallel integer sorting by bridging theory and practice. It introduces DovetailSort (DTSort), a stable, MSD-based algorithm that detects and leverages duplicate keys via sampling and heavy-light partitioning, then interleaves heavy and light buckets with a dovetail merge. The authors prove that a broad class of practical MSD IS algorithms achieves work and polylogarithmic-to-tilde span, with DTSort attaining work and span, and even work for certain distributions. Empirically, DTSort matches or outperforms state-of-the-art parallel IS and comparison sorts on synthetic and real-world data, especially with many duplicates, while maintaining scalability up to hundreds of cores. Overall, the work offers both a solid theoretical foundation and a practical, robust sorter for large-scale integer data.

Abstract

Integer sorting is a fundamental problem in computer science. This paper studies parallel integer sort both in theory and in practice. In theory, we show tighter bounds for a class of existing practical integer sort algorithms, which provides a solid theoretical foundation for their widespread usage in practice and strong performance. In practice, we design a new integer sorting algorithm, \textsf{DovetailSort}, that is theoretically-efficient and has good practical performance. In particular, \textsf{DovetailSort} overcomes a common challenge in existing parallel integer sorting algorithms, which is the difficulty of detecting and taking advantage of duplicate keys. The key insight in \textsf{DovetailSort} is to combine algorithmic ideas from both integer- and comparison-sorting algorithms. In our experiments, \textsf{DovetailSort} achieves competitive or better performance than existing state-of-the-art parallel integer and comparison sorting algorithms on various synthetic and real-world datasets.
Paper Structure (25 sections, 7 theorems, 36 figures, 4 tables, 3 algorithms)

This paper contains 25 sections, 7 theorems, 36 figures, 4 tables, 3 algorithms.

Key Result

Theorem 4.1

There exists an unstable parallel MSD sorting algorithm with $O(n\sqrt{\log r})$ work and $O(\log r + \sqrt{\log r}\log n)$ span whp.

Figures (36)

  • Figure 1: Heatmap to compare sorting algorithms on $10^9$ records with 32-bit keys and 32-bit values. All numbers are running times relative to the best for each input. Raw data are in \ref{['tab:synthetic']}. The baseline algorithms are described in \ref{['tab:baseline']}.
  • Figure 2: An overview of the approach in the DTSort. Here $r=16, \mathit{\gamma}=2$. For simplicity and space limit, the sampling scheme in the figure is not exactly accurate as described in the algorithm. Here we simply set keys with 2 or more samples as heavy keys.
  • Figure 3: Illustration of the dovetail merging step. The example merges the buckets in MSD zone 01 in \ref{['fig:sort']}. We use a letter as subscription to distinguish different records with the same key.
  • Figure 4: (a) and (b): Analysis for the performance of heavy-key detection. Numbers are running time (lower is better) with or without heavy-key detection. (a) is for 32-bit keys and (b) is for 64-bit keys. (c) and (d): Analysis for the performance of dovetail merging. Numbers are running time (lower is better) using our dovetail merging algorithm or a baseline merging algorithm. (c) is for 32-bit keys and (d) is for 64-bit keys. (e) and (f): Scalability (higher is better) with varying number of threads and running time (lower is better) with varying input sizes on 32-bit key and 32-bit value pairs on one instance: Zipf-0.8. Full analysis is given in the full paper dong2024parallelfull\ref{['sec:app-scalability']}. Discussions are in \ref{['sec:exp-study']}.
  • Figure 5: Self-speedup with varying thread counts of all tested implementations on Unif--$\boldsymbol{10^7}$.
  • ...and 31 more figures

Theorems & Definitions (7)

  • Theorem 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Theorem 4.4
  • Theorem 4.5
  • Theorem 4.6
  • Theorem 4.7