Tight Better-Than-Worst-Case Bounds for Element Distinctness and Set Intersection
Ivor van der Hoog, Eva Rotenberg, Daniel Rutschmann
TL;DR
The paper tackles element distinctness and set intersection in the comparison-based model, where the classical worst-case bound $\Omega(n \log n)$ becomes informative only when the input has few duplicates. It introduces a universal optimality framework by encoding input duplication as a graph $G(I)$ (a union of cliques) and proving instance-sensitive lower bounds, alongside adaptive algorithms that match these bounds up to constants. The main results show a tight $\Theta(\log\log n)$-competitive bound for element distinctness and a $\Theta(\log n)$-competitive bound for set intersection, with an accompanying preprocessing variant achieving $O(1)$-competitiveness for fixed input structures. This establishes a clear separation between the two problems under input structure constraints and provides a comprehensive framework for better-than-worst-case analysis in classic combinatorial problems.
Abstract
The element distinctness problem takes as input a list $I$ of $n$ values from a totally ordered universe and the goal is to decide whether $I$ contains any duplicates. It is a well-studied problem with a classical worst-case $Ω(n \log n)$ comparison-based lower bound by Fredman. At first glance, this lower bound appears to rule out any algorithm more efficient than the naive approach of sorting $I$ and comparing adjacent elements. However, upon closer inspection, the $Ω(n \log n)$ bound does not apply if the input has many duplicates. We therefore ask: Are there comparison-based lower bounds for element distinctness that are sensitive to the amount of duplicates in the input? To address this question, we derive instance-specific lower bounds. For any input instance $I$, we represent the combinatorial structure of the duplicates in $I$ by an undirected graph $G(I)$ that connects identical elements. Each such graph $G$ is a union of cliques, and we study algorithms by their worst-case running time over all inputs $I'$ with $G(I') \cong G$. We establish an adversarial lower bound showing that, for any deterministic algorithm $\mathcal{A}$, there exists a graph $G$ and an algorithm $\mathcal{A}'$ that, for all inputs $I$ with $G(I) \cong G$, is a factor $O(\log \log n)$ faster than $\mathcal{A}$. Consequently, no deterministic algorithm can be $o(\log \log n)$-competitive for all graphs $G$. We complement this with an $O(\log \log n)$-competitive deterministic algorithm, thereby obtaining tight bounds for element distinctness that go beyond classical worst-case analysis. We subsequently study the related problem of set intersection. We show that no deterministic set intersection algorithm can be $o(\log n)$-competitive, and provide an $O(\log n)$-competitive deterministic algorithm. This shows a separation between element distinctness and the set intersection problem.
