PCF Learned Sort: a Learning Augmented Sort Algorithm with $O(n \log\log n)$ Expected Complexity
Atsuki Sato, Yusuke Matsui
TL;DR
This work addresses the lack of theoretical guarantees for Learned Sort by introducing PCF Learned Sort, a learning-augmented non-division-based sorter that uses a Piecewise Constant Function CDF model to partition data. It provides rigorous worst-case and expected-time analyses: a worst-case bound of $O(nU(n) + n\log\log n)$ when paired with a general internal sort, and an expected bound of $O(n\log\log n)$ under mild distributional assumptions, with $\delta=\lfloor n^d\rfloor$ for $0<d<1$. The framework is empirically validated on synthetic and real datasets, demonstrating $O(n\log\log n)$ behavior and robustness against variance in data distributions, while highlighting the practical trade-offs with fully optimized, non-guaranteed learned sorts. The results advance the understanding of why Learned Sorts can outperform traditional sorts while ensuring stability and predictability in runtime across diverse inputs.
Abstract
Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is empirically faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose Piecewise Constant Function (PCF) Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $\mathcal{O}(n \log \log n)$ under mild assumptions on the data distribution. We also confirm empirically that PCF Learned Sort has a computational complexity of $\mathcal{O}(n \log \log n)$ on both synthetic and real datasets. This is the first study to theoretically support the empirical success of Learned Sort, and provides evidence for why Learned Sort is fast. The code is available at https://github.com/atsukisato/PCF_Learned_Sort .
