Table of Contents
Fetching ...

Robust-Sorting and Applications to Ulam-Median

Ragesh Jaiswal, Amit Kumar, Jatin Yadav

TL;DR

The paper introduces robust sorting in adversarially corrupted tournaments, achieving a near-linear number of queries while ensuring the output order agrees with the true order on all but at most a constant factor times the number of bad elements $|B|$ in expectation. It develops a pivot-based, triangle-testing algorithm to bound misalignment via concatenation loss and random sampling, yielding ${\mathbb E}[{\textsf{LCS}}(\pi, \tilde{\pi})] \ge n - (3+\varepsilon)|B|$ in time $\tilde{O}(n)$, and extends these ideas to the Ulam-$k$-Median problem. For Ulam-$k$-Median, the authors combine a linear-time robust sorting framework with a random-sampling Lemma for Ulam-1-Median and a D-sampling scheme to obtain a linear-time approximation algorithm with factor $(2-\delta)$, running in $\tilde{O}(nd\,k^k)$ time and achieving high probability guarantees. The results bridge robust sorting under adversarial node corruption with classical permutation clustering, offering practical linear-time approaches for ranking with noisy or biased comparisons. The techniques promise broader applicability to ranking with subjectivity and faulty comparisons in large-scale data.

Abstract

Sorting is one of the most basic primitives in many algorithms and data analysis tasks. Comparison-based sorting algorithms, like quick-sort and merge-sort, are known to be optimal when the outcome of each comparison is error-free. However, many real-world sorting applications operate in scenarios where the outcome of each comparison can be noisy. In this work, we explore settings where a bounded number of comparisons are potentially corrupted by erroneous agents, resulting in arbitrary, adversarial outcomes. We model the sorting problem as a query-limited tournament graph where edges involving erroneous nodes may yield arbitrary results. Our primary contribution is a randomized algorithm inspired by quick-sort that, in expectation, produces an ordering close to the true total order while only querying $\tilde{O}(n)$ edges. We achieve a distance from the target order $π$ within $(3 + ε)|B|$, where $B$ is the set of erroneous nodes, balancing the competing objectives of minimizing both query complexity and misalignment with $π$. Our algorithm needs to carefully balance two aspects: identify a pivot that partitions the vertex set evenly and ensure that this partition is "truthful" and yet query as few "triangles" in the graph $G$ as possible. Since the nodes in $B$ can potentially hide in an intricate manner, our algorithm requires several technical steps. Additionally, we demonstrate significant implications for the Ulam-$k$-Median problem, a classical clustering problem where the metric is defined on the set of permutations on a set of $d$ elements. Chakraborty, Das, and Krauthgamer gave a $(2-\varepsilon)$ FPT approximation algorithm for this problem, where the running time is super-linear in both $n$ and $d$. We use our robust sorting framework to give the first $(2-\varepsilon)$ FPT linear time approximation algorithm for this problem.

Robust-Sorting and Applications to Ulam-Median

TL;DR

The paper introduces robust sorting in adversarially corrupted tournaments, achieving a near-linear number of queries while ensuring the output order agrees with the true order on all but at most a constant factor times the number of bad elements in expectation. It develops a pivot-based, triangle-testing algorithm to bound misalignment via concatenation loss and random sampling, yielding in time , and extends these ideas to the Ulam--Median problem. For Ulam--Median, the authors combine a linear-time robust sorting framework with a random-sampling Lemma for Ulam-1-Median and a D-sampling scheme to obtain a linear-time approximation algorithm with factor , running in time and achieving high probability guarantees. The results bridge robust sorting under adversarial node corruption with classical permutation clustering, offering practical linear-time approaches for ranking with noisy or biased comparisons. The techniques promise broader applicability to ranking with subjectivity and faulty comparisons in large-scale data.

Abstract

Sorting is one of the most basic primitives in many algorithms and data analysis tasks. Comparison-based sorting algorithms, like quick-sort and merge-sort, are known to be optimal when the outcome of each comparison is error-free. However, many real-world sorting applications operate in scenarios where the outcome of each comparison can be noisy. In this work, we explore settings where a bounded number of comparisons are potentially corrupted by erroneous agents, resulting in arbitrary, adversarial outcomes. We model the sorting problem as a query-limited tournament graph where edges involving erroneous nodes may yield arbitrary results. Our primary contribution is a randomized algorithm inspired by quick-sort that, in expectation, produces an ordering close to the true total order while only querying edges. We achieve a distance from the target order within , where is the set of erroneous nodes, balancing the competing objectives of minimizing both query complexity and misalignment with . Our algorithm needs to carefully balance two aspects: identify a pivot that partitions the vertex set evenly and ensure that this partition is "truthful" and yet query as few "triangles" in the graph as possible. Since the nodes in can potentially hide in an intricate manner, our algorithm requires several technical steps. Additionally, we demonstrate significant implications for the Ulam--Median problem, a classical clustering problem where the metric is defined on the set of permutations on a set of elements. Chakraborty, Das, and Krauthgamer gave a FPT approximation algorithm for this problem, where the running time is super-linear in both and . We use our robust sorting framework to give the first FPT linear time approximation algorithm for this problem.

Paper Structure

This paper contains 20 sections, 22 theorems, 45 equations, 1 figure, 1 algorithm.

Key Result

theorem 1.1

Consider an instance of ${\textsf{Robust Sort}}\xspace$ given by a tournament graph $G=(V,E)$, where $|V| = n$, and a parameter $\varepsilon > 0$. Suppose $G$ is $b$-imperfect w.r.t. an ordering $\pi$ on $V$. Then, there is an efficient algorithm that queries $O\left(\dfrac{n \log^3 n}{\varepsilon^2

Figures (1)

  • Figure 1: An arc from $x$ to $y$ is a contradiction to the concatenation loss being $M_p$. Thus, the tuple $(x,p,y)$ forms a triangle $x\rightarrow p \rightarrow y \rightarrow x$.

Theorems & Definitions (44)

  • definition thmcounterdefinition: Imperfect representation
  • theorem 1.1
  • theorem 1.2
  • definition thmcounterdefinition: Balanced partition
  • definition thmcounterdefinition: Support and loss of a sequence
  • definition thmcounterdefinition: Concatenation Loss
  • lemma thmcounterlemma
  • proof
  • Claim 1.3
  • proof
  • ...and 34 more