Table of Contents
Fetching ...

Robust Max Selection

Trung Dang, Zhiyi Huang

TL;DR

This work develops a robust model for algorithm design under adversarial corruption of input data and studies the problem of finding the uncorrupted maximum among $n$ elements with $k$ corrupted ones. It proves that outputs must contain the uncorrupted maximum and must include at least $\\min\\{n, 2k+1\\}$ elements, then provides tight deterministic and randomized results: a deterministic algorithm with $\\Theta(nk)$ query complexity and a matching lower bound, and a randomized two-stage algorithm with high probability success and $O(n + k\\,\\operatorname{polylog} k)$ queries, nearly matching the fundamental $\\Omega(n)$ lower bound for randomness. The results establish near-optimal trade-offs between output size and query complexity in the presence of adversarial input corruption and outline open directions for tightening bounds and extending to broader selection tasks. These findings advance robust algorithm design when data owners may act adversarially and comparisons yield unreliable information, with potential implications for distributed systems and fault-tolerant data processing.

Abstract

We introduce a new model to study algorithm design under unreliable information, and apply this model for the problem of finding the uncorrupted maximum element of a list containing $n$ elements, among which are $k$ corrupted elements. Under our model, algorithms can perform black-box comparison queries between any pair of elements. However, queries regarding corrupted elements may have arbitrary output. In particular, corrupted elements do not need to behave as any consistent values, and may introduce cycles in the elements' ordering. This imposes new challenges for designing correct algorithms under this setting. For example, one cannot simply output a single element, as it is impossible to distinguish elements of a list containing one corrupted and one uncorrupted element. To ensure correctness, algorithms under this setting must output a set to make sure the uncorrupted maximum element is included. We first show that any algorithm must output a set of size at least $\min\{n, 2k + 1\}$ to ensure that the uncorrupted maximum is contained in the output set. Restricted to algorithms whose output size is exactly $\min\{n, 2k + 1\}$, for deterministic algorithms, we show matching upper and lower bounds of $Θ(nk)$ comparison queries to produce a set of elements that contains the uncorrupted maximum. On the randomized side, we propose a 2-stage algorithm that, with high probability, uses $O(n + k \operatorname{polylog} k)$ comparison queries to find such a set, almost matching the $Ω(n)$ queries necessary for any randomized algorithm to obtain a constant probability of being correct.

Robust Max Selection

TL;DR

This work develops a robust model for algorithm design under adversarial corruption of input data and studies the problem of finding the uncorrupted maximum among elements with corrupted ones. It proves that outputs must contain the uncorrupted maximum and must include at least elements, then provides tight deterministic and randomized results: a deterministic algorithm with query complexity and a matching lower bound, and a randomized two-stage algorithm with high probability success and queries, nearly matching the fundamental lower bound for randomness. The results establish near-optimal trade-offs between output size and query complexity in the presence of adversarial input corruption and outline open directions for tightening bounds and extending to broader selection tasks. These findings advance robust algorithm design when data owners may act adversarially and comparisons yield unreliable information, with potential implications for distributed systems and fault-tolerant data processing.

Abstract

We introduce a new model to study algorithm design under unreliable information, and apply this model for the problem of finding the uncorrupted maximum element of a list containing elements, among which are corrupted elements. Under our model, algorithms can perform black-box comparison queries between any pair of elements. However, queries regarding corrupted elements may have arbitrary output. In particular, corrupted elements do not need to behave as any consistent values, and may introduce cycles in the elements' ordering. This imposes new challenges for designing correct algorithms under this setting. For example, one cannot simply output a single element, as it is impossible to distinguish elements of a list containing one corrupted and one uncorrupted element. To ensure correctness, algorithms under this setting must output a set to make sure the uncorrupted maximum element is included. We first show that any algorithm must output a set of size at least to ensure that the uncorrupted maximum is contained in the output set. Restricted to algorithms whose output size is exactly , for deterministic algorithms, we show matching upper and lower bounds of comparison queries to produce a set of elements that contains the uncorrupted maximum. On the randomized side, we propose a 2-stage algorithm that, with high probability, uses comparison queries to find such a set, almost matching the queries necessary for any randomized algorithm to obtain a constant probability of being correct.
Paper Structure (18 sections, 13 theorems, 7 equations, 2 algorithms)

This paper contains 18 sections, 13 theorems, 7 equations, 2 algorithms.

Key Result

Lemma 1

For any $n$ and $k$, there exists an instance where the output set size $|S|$ for any algorithm is at least $\min\{n,2k+1\}$ to ensure that the maximum element is always included.

Theorems & Definitions (23)

  • Lemma 1
  • proof : Proof of \ref{['lem:setsizelb']}
  • Lemma 2
  • proof : Proof of \ref{['lem:setsizeub']}
  • Theorem 3
  • proof : Proof of \ref{['t:detlb']}
  • Theorem 4
  • Lemma 5
  • proof : Proof of \ref{['lem:detoutputmax']}
  • Theorem 6
  • ...and 13 more