Deterministic $k$-Median Clustering in Near-Optimal Time

Martín Costa; Ermiya Farokhnejad

Deterministic $k$-Median Clustering in Near-Optimal Time

Martín Costa, Ermiya Farokhnejad

TL;DR

This work addresses deterministic algorithms for metric $k$-median in near-optimal time. It combines a hierarchical partitioning framework with a restricted $k$-median approach and a restricted reverse greedy algorithm to achieve a $O(\log(n/k))$-approximation in $\tilde{O}(nk)$ time, matching the randomized results up to polylog factors but revealing a deterministic lower bound that grows with $\log n$ and depends on $k$. A key contribution is the Restricted-$k$-median routine and its efficient deterministic implementation, enabling a bottom-up assembly of a global solution from local, restricted subproblems. The results establish a separation between deterministic and randomized settings for $k$-median and extend to the $k$-means case, with implications for near-optimal clustering under deterministic constraints.

Abstract

The metric $k$-median problem is a textbook clustering problem. As input, we are given a metric space $V$ of size $n$ and an integer $k$, and our task is to find a subset $S \subseteq V$ of at most $k$ `centers' that minimizes the total distance from each point in $V$ to its nearest center in $S$. Mettu and Plaxton [UAI'02] gave a randomized algorithm for $k$-median that computes a $O(1)$-approximation in $\tilde O(nk)$ time. They also showed that any algorithm for this problem with a bounded approximation ratio must have a running time of $Ω(nk)$. Thus, the running time of their algorithm is optimal up to polylogarithmic factors. For deterministic $k$-median, Guha et al.~[FOCS'00] gave an algorithm that computes a $\text{poly}(\log (n/k))$-approximation in $\tilde O(nk)$ time, where the degree of the polynomial in the approximation is unspecified. To the best of our knowledge, this remains the state-of-the-art approximation of any deterministic $k$-median algorithm with this running time. This leads us to the following natural question: What is the best approximation of a deterministic $k$-median algorithm with near-optimal running time? We make progress in answering this question by giving a deterministic algorithm that computes a $O(\log(n/k))$-approximation in $\tilde O(nk)$ time. We also provide a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of $Ω(\log n/(\log k + \log \log n))$, establishing a gap between the randomized and deterministic settings for $k$-median.

Deterministic $k$-Median Clustering in Near-Optimal Time

TL;DR

This work addresses deterministic algorithms for metric

-median in near-optimal time. It combines a hierarchical partitioning framework with a restricted

-median approach and a restricted reverse greedy algorithm to achieve a

-approximation in

time, matching the randomized results up to polylog factors but revealing a deterministic lower bound that grows with

and depends on

. A key contribution is the Restricted-

-median routine and its efficient deterministic implementation, enabling a bottom-up assembly of a global solution from local, restricted subproblems. The results establish a separation between deterministic and randomized settings for

-median and extend to the

-means case, with implications for near-optimal clustering under deterministic constraints.

Abstract

The metric

-median problem is a textbook clustering problem. As input, we are given a metric space

of size

and an integer

, and our task is to find a subset

of at most

`centers' that minimizes the total distance from each point in

to its nearest center in

. Mettu and Plaxton [UAI'02] gave a randomized algorithm for

-median that computes a

-approximation in

time. They also showed that any algorithm for this problem with a bounded approximation ratio must have a running time of

. Thus, the running time of their algorithm is optimal up to polylogarithmic factors. For deterministic

-median, Guha et al.~[FOCS'00] gave an algorithm that computes a

-approximation in

time, where the degree of the polynomial in the approximation is unspecified. To the best of our knowledge, this remains the state-of-the-art approximation of any deterministic

-median algorithm with this running time. This leads us to the following natural question: What is the best approximation of a deterministic

-median algorithm with near-optimal running time? We make progress in answering this question by giving a deterministic algorithm that computes a

-approximation in

time. We also provide a lower bound showing that any deterministic algorithm with this running time must have an approximation ratio of

, establishing a gap between the randomized and deterministic settings for

-median.

Deterministic $k$-Median Clustering in Near-Optimal Time

TL;DR

Abstract

Deterministic $k$-Median Clustering in Near-Optimal Time

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (54)