Robust-Sorting and Applications to Ulam-Median
Ragesh Jaiswal, Amit Kumar, Jatin Yadav
TL;DR
The paper introduces robust sorting in adversarially corrupted tournaments, achieving a near-linear number of queries while ensuring the output order agrees with the true order on all but at most a constant factor times the number of bad elements $|B|$ in expectation. It develops a pivot-based, triangle-testing algorithm to bound misalignment via concatenation loss and random sampling, yielding ${\mathbb E}[{\textsf{LCS}}(\pi, \tilde{\pi})] \ge n - (3+\varepsilon)|B|$ in time $\tilde{O}(n)$, and extends these ideas to the Ulam-$k$-Median problem. For Ulam-$k$-Median, the authors combine a linear-time robust sorting framework with a random-sampling Lemma for Ulam-1-Median and a D-sampling scheme to obtain a linear-time approximation algorithm with factor $(2-\delta)$, running in $\tilde{O}(nd\,k^k)$ time and achieving high probability guarantees. The results bridge robust sorting under adversarial node corruption with classical permutation clustering, offering practical linear-time approaches for ranking with noisy or biased comparisons. The techniques promise broader applicability to ranking with subjectivity and faulty comparisons in large-scale data.
Abstract
Sorting is one of the most basic primitives in many algorithms and data analysis tasks. Comparison-based sorting algorithms, like quick-sort and merge-sort, are known to be optimal when the outcome of each comparison is error-free. However, many real-world sorting applications operate in scenarios where the outcome of each comparison can be noisy. In this work, we explore settings where a bounded number of comparisons are potentially corrupted by erroneous agents, resulting in arbitrary, adversarial outcomes. We model the sorting problem as a query-limited tournament graph where edges involving erroneous nodes may yield arbitrary results. Our primary contribution is a randomized algorithm inspired by quick-sort that, in expectation, produces an ordering close to the true total order while only querying $\tilde{O}(n)$ edges. We achieve a distance from the target order $π$ within $(3 + ε)|B|$, where $B$ is the set of erroneous nodes, balancing the competing objectives of minimizing both query complexity and misalignment with $π$. Our algorithm needs to carefully balance two aspects: identify a pivot that partitions the vertex set evenly and ensure that this partition is "truthful" and yet query as few "triangles" in the graph $G$ as possible. Since the nodes in $B$ can potentially hide in an intricate manner, our algorithm requires several technical steps. Additionally, we demonstrate significant implications for the Ulam-$k$-Median problem, a classical clustering problem where the metric is defined on the set of permutations on a set of $d$ elements. Chakraborty, Das, and Krauthgamer gave a $(2-\varepsilon)$ FPT approximation algorithm for this problem, where the running time is super-linear in both $n$ and $d$. We use our robust sorting framework to give the first $(2-\varepsilon)$ FPT linear time approximation algorithm for this problem.
