Tight Bounds for Sorting Under Partial Information
Ivor van der Hoog, Daniel Rutschmann
TL;DR
Sorting under partial information asks to recover a linear extension $L$ of a ground set $X$ given a partial order $P$, using as few linear-oracle queries as possible; the information content is captured by $e(P)$, the number of linear extensions, with the theoretical lower bound $\\log e(P)$. The authors present a subquadratic-time algorithm that, for any constant $c\\ge 1$, preprocesses $P$ in $O(n^{1+1/c})$ time and recovers $L$ with $Θ(c \\log e(P))$ linear-oracle queries in $O(c \\log e(P))$ time, plus a matching lower bound showing this trade-off is tight across preprocessing, queries, and time. The method combines a greedy chain decomposition and Huffman-like chain merging with exponential search, supported by entropy-based arguments on the incomparability graph to bound both upper and lower bounds. The results establish a tight three-way bound for sorting under partial information, offering the first subquadratic preprocessing scheme with provable optimality in query complexity and runtime for all constant trade-offs.
Abstract
Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $Θ(\log e(P))$ linear oracle queries (using $O(n^2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n^2)$ partial oracle queries and $O(n^{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $Θ(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n^{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $Θ(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(α, β)$ where for any constant $c \geq 1$, any algorithm that uses at most $α\cdot n^{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $β\cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.
