Table of Contents
Fetching ...

SquareSort: a cache-oblivious sorting algorithm

Michal Koucký, Josef Matějka

TL;DR

SquareSort addresses cache-oblivious sorting in the external memory model by structuring the input as a $\sqrt{n}\times\sqrt{n}$ matrix, recursively sorting columns, applying a SkewTranspose to bucketize elements, and then sorting the buckets. The core novelty is the SkewTranspose, which partitions data into buckets using random pivots and reorganizes elements to achieve near-optimal IO complexity $O\left(\frac{n}{B}\log_{M/B} n\right)$ under the tall-cache regime $M \ge B^2$. The authors provide a detailed recurrence-based analysis and probabilistic bucket-size bounds to establish the main IO bound, along with an experimental comparison showing competitive performance relative to std::sort and FunnelSort. The work contributes a conceptually simple cache-oblivious sorting approach with supporting theoretical and empirical evaluation, highlighting the practical viability of skew-based distribution-sort techniques in hierarchical memory systems.

Abstract

In this paper we consider sorting in the cache-oblivious model of Frigo, Leiserson, Prokop, and Ramachandran (1999). We introduce a new simple sorting algorithm in that model which has asymptotically optimal IO complexity $O(\frac{n}{B} \log_{M/B} n)$, where $n$ is the instance size, $M$ size of the cache and $B$ size of a memory block. This is the same as the complexity of the best known cache-oblivious sorting algorithm FunnelSort.

SquareSort: a cache-oblivious sorting algorithm

TL;DR

SquareSort addresses cache-oblivious sorting in the external memory model by structuring the input as a matrix, recursively sorting columns, applying a SkewTranspose to bucketize elements, and then sorting the buckets. The core novelty is the SkewTranspose, which partitions data into buckets using random pivots and reorganizes elements to achieve near-optimal IO complexity under the tall-cache regime . The authors provide a detailed recurrence-based analysis and probabilistic bucket-size bounds to establish the main IO bound, along with an experimental comparison showing competitive performance relative to std::sort and FunnelSort. The work contributes a conceptually simple cache-oblivious sorting approach with supporting theoretical and empirical evaluation, highlighting the practical viability of skew-based distribution-sort techniques in hierarchical memory systems.

Abstract

In this paper we consider sorting in the cache-oblivious model of Frigo, Leiserson, Prokop, and Ramachandran (1999). We introduce a new simple sorting algorithm in that model which has asymptotically optimal IO complexity , where is the instance size, size of the cache and size of a memory block. This is the same as the complexity of the best known cache-oblivious sorting algorithm FunnelSort.
Paper Structure (16 sections, 10 theorems, 18 equations, 6 figures, 3 algorithms)

This paper contains 16 sections, 10 theorems, 18 equations, 6 figures, 3 algorithms.

Key Result

Theorem 1.1

SquareSort of $n$ items uses $O(\frac{n}{B} \log_{M/B} n)$ IOs in expectation over its randomness.

Figures (6)

  • Figure 1: An illustration of the SquareSort algorithm.
  • Figure 2: Illustration of a call to ${\mathrm{SkewTranspose}}$. Pointers $col[i]$ and $buc[j]$ will advance during the procedure.
  • Figure 3: Time per item to sort a random permutation (left) and a random binary sequence (right).
  • Figure 4: Time per item to sort a random sequence of elements from the universe of size $n$ (left) and of size $\sqrt{n}$ (right).
  • Figure 5: Time per item to sort a random permutation with different cutoffs.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Theorem 1.1: Informal
  • Lemma 2.1
  • proof
  • Theorem 3.1
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • proof
  • Proposition 3.4
  • proof
  • ...and 9 more