An improved approximation algorithm for k-Median
Neal E. Young
TL;DR
The paper tackles the k-Median problem in a bicriteria setting, aiming to output a center set of size at most αk with cost no larger than the best size-k solution. It introduces a capped-cost function and a two-phase approach, first reducing the capped cost to below 1 and then polishing with a small number of extra centers to achieve a cost bound, all without solving the standard LP. The main theoretical contribution is a poly-time α_{kn}-approximation with α_{kn} ≤ 1 + 2 ln(n/k) (and ≤ 2H_Δ), matching Set Cover bounds within a factor of two, and a fast LP-free variant that runs in O(k m log(n/k) log m) time. The work also provides a dual-interpretation via implicit LP dual solutions and discusses comparisons to prior bicriteria methods and open problems for tightening the bounds and extending to related problems. Overall, the results advance efficient, near-optimal bicriteria guarantees for k-Median in non-metric settings with practical implications for large-scale clustering and facility location-like problems.
Abstract
We give a polynomial-time approximation algorithm for the (not necessarily metric) $k$-Median problem. The algorithm is an $α$-size-approximation algorithm for $α< 1 + 2 \ln(n/k)$. That is, it guarantees a solution having size at most $α\times k$, and cost at most the cost of any size-$k$ solution. This is the first polynomial-time approximation algorithm to match the well-known bounds of $H_Δ$ and $1 + \ln(n/k)$ for unweighted Set Cover (a special case) within a constant factor. It matches these bounds within a factor of 2. The algorithm runs in time $O(k m \log(n/k) \log m)$, where $n$ is the number of customers and $m$ is the instance size.
