SquareSort: a cache-oblivious sorting algorithm
Michal Koucký, Josef Matějka
TL;DR
SquareSort addresses cache-oblivious sorting in the external memory model by structuring the input as a $\sqrt{n}\times\sqrt{n}$ matrix, recursively sorting columns, applying a SkewTranspose to bucketize elements, and then sorting the buckets. The core novelty is the SkewTranspose, which partitions data into buckets using random pivots and reorganizes elements to achieve near-optimal IO complexity $O\left(\frac{n}{B}\log_{M/B} n\right)$ under the tall-cache regime $M \ge B^2$. The authors provide a detailed recurrence-based analysis and probabilistic bucket-size bounds to establish the main IO bound, along with an experimental comparison showing competitive performance relative to std::sort and FunnelSort. The work contributes a conceptually simple cache-oblivious sorting approach with supporting theoretical and empirical evaluation, highlighting the practical viability of skew-based distribution-sort techniques in hierarchical memory systems.
Abstract
In this paper we consider sorting in the cache-oblivious model of Frigo, Leiserson, Prokop, and Ramachandran (1999). We introduce a new simple sorting algorithm in that model which has asymptotically optimal IO complexity $O(\frac{n}{B} \log_{M/B} n)$, where $n$ is the instance size, $M$ size of the cache and $B$ size of a memory block. This is the same as the complexity of the best known cache-oblivious sorting algorithm FunnelSort.
