Table of Contents
Fetching ...

Deterministic $k$-Median Clustering in Near-Optimal Time

Martín Costa, Ermiya Farokhnejad

TL;DR

This work addresses deterministic algorithms for metric $k$-median in near-optimal time. It combines a hierarchical partitioning framework with a restricted $k$-median approach and a restricted reverse greedy algorithm to achieve a $O(\log(n/k))$-approximation in $\tilde{O}(nk)$ time, matching the randomized results up to polylog factors but revealing a deterministic lower bound that grows with $\log n$ and depends on $k$. A key contribution is the Restricted-$k$-median routine and its efficient deterministic implementation, enabling a bottom-up assembly of a global solution from local, restricted subproblems. The results establish a separation between deterministic and randomized settings for $k$-median and extend to the $k$-means case, with implications for near-optimal clustering under deterministic constraints.

Abstract

The metric $k$-median problem is a textbook clustering problem. As input, we are given a metric space $V$ of size $n$ and an integer $k$, and our task is to find a subset $S \subseteq V$ of at most $k$ `centers' that minimizes the total distance from each point in $V$ to its nearest center in $S$. Mettu and Plaxton [UAI'02] gave a randomized algorithm for $k$-median that computes a $O(1)$-approximation in $\tilde O(nk)$ time. They also showed that any algorithm for this problem with a bounded approximation ratio must have a running time of $Ω(nk)$. Thus, the running time of their algorithm is optimal up to polylogarithmic factors. For deterministic $k$-median, Guha et al.~[FOCS'00] gave an algorithm that computes a $\text{poly}(\log (n/k))$-approximation in $\tilde O(nk)$ time, where the degree of the polynomial in the approximation is unspecified. To the best of our knowledge, this remains the state-of-the-art approximation of any deterministic $k$-median algorithm with this running time. This leads us to the following natural question: What is the best approximation of a deterministic $k$-median algorithm with near-optimal running time? We make progress in answering this question by giving a deterministic algorithm that computes a $O(\log(n/k))$-approximation in $\tilde O(nk)$ time. We also provide a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of $Ω(\log n/(\log k + \log \log n))$, establishing a gap between the randomized and deterministic settings for $k$-median.

Deterministic $k$-Median Clustering in Near-Optimal Time

TL;DR

This work addresses deterministic algorithms for metric -median in near-optimal time. It combines a hierarchical partitioning framework with a restricted -median approach and a restricted reverse greedy algorithm to achieve a -approximation in time, matching the randomized results up to polylog factors but revealing a deterministic lower bound that grows with and depends on . A key contribution is the Restricted--median routine and its efficient deterministic implementation, enabling a bottom-up assembly of a global solution from local, restricted subproblems. The results establish a separation between deterministic and randomized settings for -median and extend to the -means case, with implications for near-optimal clustering under deterministic constraints.

Abstract

The metric -median problem is a textbook clustering problem. As input, we are given a metric space of size and an integer , and our task is to find a subset of at most `centers' that minimizes the total distance from each point in to its nearest center in . Mettu and Plaxton [UAI'02] gave a randomized algorithm for -median that computes a -approximation in time. They also showed that any algorithm for this problem with a bounded approximation ratio must have a running time of . Thus, the running time of their algorithm is optimal up to polylogarithmic factors. For deterministic -median, Guha et al.~[FOCS'00] gave an algorithm that computes a -approximation in time, where the degree of the polynomial in the approximation is unspecified. To the best of our knowledge, this remains the state-of-the-art approximation of any deterministic -median algorithm with this running time. This leads us to the following natural question: What is the best approximation of a deterministic -median algorithm with near-optimal running time? We make progress in answering this question by giving a deterministic algorithm that computes a -approximation in time. We also provide a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of , establishing a gap between the randomized and deterministic settings for -median.

Paper Structure

This paper contains 34 sections, 29 theorems, 60 equations.

Key Result

Theorem 1.1

There is a deterministic algorithm for $k$-median that, given a metric space of size $n$, computes a $O(\log(n/k))$-approximate solution in $\tilde{O}(nk)$ time.

Theorems & Definitions (54)

  • Theorem 1.1
  • Theorem 1.2
  • Lemma 2.1
  • proof
  • Corollary 2.2
  • proof
  • Lemma 3.1: focs/GuhaMMO00
  • Theorem 3.2
  • Corollary 3.3
  • Theorem 3.4
  • ...and 44 more