Robust Max Selection
Trung Dang, Zhiyi Huang
TL;DR
This work develops a robust model for algorithm design under adversarial corruption of input data and studies the problem of finding the uncorrupted maximum among $n$ elements with $k$ corrupted ones. It proves that outputs must contain the uncorrupted maximum and must include at least $\\min\\{n, 2k+1\\}$ elements, then provides tight deterministic and randomized results: a deterministic algorithm with $\\Theta(nk)$ query complexity and a matching lower bound, and a randomized two-stage algorithm with high probability success and $O(n + k\\,\\operatorname{polylog} k)$ queries, nearly matching the fundamental $\\Omega(n)$ lower bound for randomness. The results establish near-optimal trade-offs between output size and query complexity in the presence of adversarial input corruption and outline open directions for tightening bounds and extending to broader selection tasks. These findings advance robust algorithm design when data owners may act adversarially and comparisons yield unreliable information, with potential implications for distributed systems and fault-tolerant data processing.
Abstract
We introduce a new model to study algorithm design under unreliable information, and apply this model for the problem of finding the uncorrupted maximum element of a list containing $n$ elements, among which are $k$ corrupted elements. Under our model, algorithms can perform black-box comparison queries between any pair of elements. However, queries regarding corrupted elements may have arbitrary output. In particular, corrupted elements do not need to behave as any consistent values, and may introduce cycles in the elements' ordering. This imposes new challenges for designing correct algorithms under this setting. For example, one cannot simply output a single element, as it is impossible to distinguish elements of a list containing one corrupted and one uncorrupted element. To ensure correctness, algorithms under this setting must output a set to make sure the uncorrupted maximum element is included. We first show that any algorithm must output a set of size at least $\min\{n, 2k + 1\}$ to ensure that the uncorrupted maximum is contained in the output set. Restricted to algorithms whose output size is exactly $\min\{n, 2k + 1\}$, for deterministic algorithms, we show matching upper and lower bounds of $Θ(nk)$ comparison queries to produce a set of elements that contains the uncorrupted maximum. On the randomized side, we propose a 2-stage algorithm that, with high probability, uses $O(n + k \operatorname{polylog} k)$ comparison queries to find such a set, almost matching the $Ω(n)$ queries necessary for any randomized algorithm to obtain a constant probability of being correct.
