Tight Bounds for Sorting Under Partial Information

Ivor van der Hoog; Daniel Rutschmann

Tight Bounds for Sorting Under Partial Information

Ivor van der Hoog, Daniel Rutschmann

TL;DR

Sorting under partial information asks to recover a linear extension $L$ of a ground set $X$ given a partial order $P$, using as few linear-oracle queries as possible; the information content is captured by $e(P)$, the number of linear extensions, with the theoretical lower bound $\\log e(P)$. The authors present a subquadratic-time algorithm that, for any constant $c\\ge 1$, preprocesses $P$ in $O(n^{1+1/c})$ time and recovers $L$ with $Θ(c \\log e(P))$ linear-oracle queries in $O(c \\log e(P))$ time, plus a matching lower bound showing this trade-off is tight across preprocessing, queries, and time. The method combines a greedy chain decomposition and Huffman-like chain merging with exponential search, supported by entropy-based arguments on the incomparability graph to bound both upper and lower bounds. The results establish a tight three-way bound for sorting under partial information, offering the first subquadratic preprocessing scheme with provable optimality in query complexity and runtime for all constant trade-offs.

Abstract

Sorting has a natural generalization where the input consists of: (1) a ground set $X$ of size $n$, (2) a partial oracle $O_P$ specifying some fixed partial order $P$ on $X$ and (3) a linear oracle $O_L$ specifying a linear order $L$ that extends $P$. The goal is to recover the linear order $L$ on $X$ using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to $O_L$, oracle queries to $O_P$, and the time spent. Any algorithm requires worst-case $\log_2 e(P)$ linear oracle queries to recover the linear order on $X$. Kahn and Saks presented the first algorithm that uses $Θ(\log e(P))$ linear oracle queries (using $O(n^2)$ partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess $P$ using $O(n^2)$ partial oracle queries and $O(n^{2.5})$ time. Then, given $O_L$, they uncover the linear order on $X$ in $Θ(\log e(P))$ linear oracle queries and $O(n + \log e(P))$ time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For $c \geq 1$, our algorithm can preprocess $O_P$ using $O(n^{1 + \frac{1}{c}})$ queries and time. Given $O_L$, we uncover $L$ using $Θ(c \log e(P))$ queries and time. We show a matching lower bound, as there exist positive constants $(α, β)$ where for any constant $c \geq 1$, any algorithm that uses at most $α\cdot n^{1 + \frac{1}{c}}$ preprocessing must use worst-case at least $β\cdot c \log e(P)$ linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.

Tight Bounds for Sorting Under Partial Information

TL;DR

Sorting under partial information asks to recover a linear extension

of a ground set

given a partial order

, using as few linear-oracle queries as possible; the information content is captured by

, the number of linear extensions, with the theoretical lower bound

. The authors present a subquadratic-time algorithm that, for any constant

, preprocesses

time and recovers

with

linear-oracle queries in

time, plus a matching lower bound showing this trade-off is tight across preprocessing, queries, and time. The method combines a greedy chain decomposition and Huffman-like chain merging with exponential search, supported by entropy-based arguments on the incomparability graph to bound both upper and lower bounds. The results establish a tight three-way bound for sorting under partial information, offering the first subquadratic preprocessing scheme with provable optimality in query complexity and runtime for all constant trade-offs.

Abstract

Sorting has a natural generalization where the input consists of: (1) a ground set

of size

, (2) a partial oracle

specifying some fixed partial order

and (3) a linear oracle

specifying a linear order

that extends

. The goal is to recover the linear order

using the fewest number of linear oracle queries. In this problem, we measure algorithmic complexity through three metrics: oracle queries to

, oracle queries to

, and the time spent. Any algorithm requires worst-case

linear oracle queries to recover the linear order on

. Kahn and Saks presented the first algorithm that uses

linear oracle queries (using

partial oracle queries and exponential time). The state-of-the-art for the general problem is by Cardinal, Fiorini, Joret, Jungers and Munro who at STOC'10 manage to separate the linear and partial oracle queries into a preprocessing and query phase. They can preprocess

using

partial oracle queries and

time. Then, given

, they uncover the linear order on

linear oracle queries and

time -- which is worst-case optimal in the number of linear oracle queries but not in the time spent. For

, our algorithm can preprocess

using

queries and time. Given

, we uncover

using

queries and time. We show a matching lower bound, as there exist positive constants

where for any constant

, any algorithm that uses at most

preprocessing must use worst-case at least

linear oracle queries. Thus, we solve the problem of sorting under partial information through an algorithm that is asymptotically tight across all three metrics.

Paper Structure (29 sections, 27 theorems, 14 equations, 1 figure, 1 table, 5 algorithms)

This paper contains 29 sections, 27 theorems, 14 equations, 1 figure, 1 table, 5 algorithms.

Funding.
Introduction
Previous Work.
Related work.
Contribution.
Key Ideas.
Preliminaries
Input and output.
Decision Trees (DT).
Chains and antichains.
Log-extensions, graph entropy, and the incomparability graph.
Finger search trees.
Algorithm description
Algorithmic description.
Analysing the Preprocessing Phase
...and 14 more sections

Key Result

Theorem 1

For every constant $c \ge 1$, there is a deterministic algorithm that first preprocesses $P$ in $O(n^{1+1/c})$ time, and then asks $O(c \log e(P))$ linear oracle queries in $O(c \log e(P))$ time to recover the linear order on $X$, represented as a leaf-linked tree on $X$.

Figures (1)

Figure 1: Our family of pairs of partial orders with linear extensions $\{ (P_i, L_i) \}$ can be constructed in two stages. (a): Step 1: partition the vertices $x_i$ for $i \leq n/2$ into $w$ equal-size chains. (b) For all $\ell > n/2$, arbitrarily assign $\ell$ to lie between any two connected vertices of the previous construction. All $x_\ell$ that get assigned the same pair $(x_{j+kw}, x_{j + (k+1)w})$ are linearly ordered by $\ell$. The corresponding linear order $L_i$ is obtained by adding the red edges.

Theorems & Definitions (55)

Theorem 1
Theorem 2
Theorem 3
Theorem 4
Lemma 1
Definition 1
Lemma 2
Lemma 3: Theorem 1.1 in kahn_entropy_1992, improved to Lemma 4 in cardinal_sorting_2010
Theorem 5: Theorem 2.1 in cardinal_efficient_2010, rephrased as in Theorem 1 in cardinal_sorting_2010
Lemma 4: Theorem 2 cardinal_sorting_2010
...and 45 more

Tight Bounds for Sorting Under Partial Information

TL;DR

Abstract

Tight Bounds for Sorting Under Partial Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (55)