Table of Contents
Fetching ...

Tight Better-Than-Worst-Case Bounds for Element Distinctness and Set Intersection

Ivor van der Hoog, Eva Rotenberg, Daniel Rutschmann

TL;DR

The paper tackles element distinctness and set intersection in the comparison-based model, where the classical worst-case bound $\Omega(n \log n)$ becomes informative only when the input has few duplicates. It introduces a universal optimality framework by encoding input duplication as a graph $G(I)$ (a union of cliques) and proving instance-sensitive lower bounds, alongside adaptive algorithms that match these bounds up to constants. The main results show a tight $\Theta(\log\log n)$-competitive bound for element distinctness and a $\Theta(\log n)$-competitive bound for set intersection, with an accompanying preprocessing variant achieving $O(1)$-competitiveness for fixed input structures. This establishes a clear separation between the two problems under input structure constraints and provides a comprehensive framework for better-than-worst-case analysis in classic combinatorial problems.

Abstract

The element distinctness problem takes as input a list $I$ of $n$ values from a totally ordered universe and the goal is to decide whether $I$ contains any duplicates. It is a well-studied problem with a classical worst-case $Ω(n \log n)$ comparison-based lower bound by Fredman. At first glance, this lower bound appears to rule out any algorithm more efficient than the naive approach of sorting $I$ and comparing adjacent elements. However, upon closer inspection, the $Ω(n \log n)$ bound does not apply if the input has many duplicates. We therefore ask: Are there comparison-based lower bounds for element distinctness that are sensitive to the amount of duplicates in the input? To address this question, we derive instance-specific lower bounds. For any input instance $I$, we represent the combinatorial structure of the duplicates in $I$ by an undirected graph $G(I)$ that connects identical elements. Each such graph $G$ is a union of cliques, and we study algorithms by their worst-case running time over all inputs $I'$ with $G(I') \cong G$. We establish an adversarial lower bound showing that, for any deterministic algorithm $\mathcal{A}$, there exists a graph $G$ and an algorithm $\mathcal{A}'$ that, for all inputs $I$ with $G(I) \cong G$, is a factor $O(\log \log n)$ faster than $\mathcal{A}$. Consequently, no deterministic algorithm can be $o(\log \log n)$-competitive for all graphs $G$. We complement this with an $O(\log \log n)$-competitive deterministic algorithm, thereby obtaining tight bounds for element distinctness that go beyond classical worst-case analysis. We subsequently study the related problem of set intersection. We show that no deterministic set intersection algorithm can be $o(\log n)$-competitive, and provide an $O(\log n)$-competitive deterministic algorithm. This shows a separation between element distinctness and the set intersection problem.

Tight Better-Than-Worst-Case Bounds for Element Distinctness and Set Intersection

TL;DR

The paper tackles element distinctness and set intersection in the comparison-based model, where the classical worst-case bound becomes informative only when the input has few duplicates. It introduces a universal optimality framework by encoding input duplication as a graph (a union of cliques) and proving instance-sensitive lower bounds, alongside adaptive algorithms that match these bounds up to constants. The main results show a tight -competitive bound for element distinctness and a -competitive bound for set intersection, with an accompanying preprocessing variant achieving -competitiveness for fixed input structures. This establishes a clear separation between the two problems under input structure constraints and provides a comprehensive framework for better-than-worst-case analysis in classic combinatorial problems.

Abstract

The element distinctness problem takes as input a list of values from a totally ordered universe and the goal is to decide whether contains any duplicates. It is a well-studied problem with a classical worst-case comparison-based lower bound by Fredman. At first glance, this lower bound appears to rule out any algorithm more efficient than the naive approach of sorting and comparing adjacent elements. However, upon closer inspection, the bound does not apply if the input has many duplicates. We therefore ask: Are there comparison-based lower bounds for element distinctness that are sensitive to the amount of duplicates in the input? To address this question, we derive instance-specific lower bounds. For any input instance , we represent the combinatorial structure of the duplicates in by an undirected graph that connects identical elements. Each such graph is a union of cliques, and we study algorithms by their worst-case running time over all inputs with . We establish an adversarial lower bound showing that, for any deterministic algorithm , there exists a graph and an algorithm that, for all inputs with , is a factor faster than . Consequently, no deterministic algorithm can be -competitive for all graphs . We complement this with an -competitive deterministic algorithm, thereby obtaining tight bounds for element distinctness that go beyond classical worst-case analysis. We subsequently study the related problem of set intersection. We show that no deterministic set intersection algorithm can be -competitive, and provide an -competitive deterministic algorithm. This shows a separation between element distinctness and the set intersection problem.

Paper Structure

This paper contains 24 sections, 16 theorems, 38 equations, 1 figure, 1 algorithm.

Key Result

Lemma 1

Block Sorting finds a duplicate after $O((C(L)+D(L))\max\{1, \log D(L)\})$ comparisons.

Figures (1)

  • Figure 1: Our algorithms Block Sorting and Median Recursion for element distinctness.

Theorems & Definitions (40)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • Theorem 2
  • Claim 1
  • proof
  • Claim 2
  • proof
  • ...and 30 more