Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces
Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, Joachim Spoerhase
TL;DR
This work advances parameterized approximations for Robust $(k,z)$-Clustering in geometric spaces by (i) surpassing general-metric barriers with a $3^z(1-\eta_0)$ FPT approximate algorithm in discrete high-dimensional Euclidean spaces, (ii) establishing a W[1]-hardness result for discrete $k$-Center in dimensions $\Theta(\log n)$, and (iii) delivering an EPAS for metrics with sub-logarithmic doubling dimension via coresets and ball-decomposition techniques. The core technical contributions include a strengthened assignment/projection lemma leveraging mid-point closure, a coreset framework for doubling metrics, and a careful hardness construction from Multi-Colored Independent Set. The results collectively map the FPT-approximation landscape for Robust $(k,z)$-Clustering in geometric spaces, showing both improved algorithmic prospects and sharp hardness boundaries, with practical impact for robust and fair clustering in high-dimensional data settings. Key mathematical features include the factor $3^z(1-\eta_0)$ in FPT time, the coreset size $(\tfrac{2^z}{\varepsilon})^{O(d)} kz \log n$, and the EPAS runtime $((\tfrac{2^z}{\varepsilon})^d k \log k)^{O(k)}$ in doubling metrics.
Abstract
We consider the well-studied Robust $(k, z)$-Clustering problem, which generalizes the classic $k$-Median, $k$-Means, and $k$-Center problems. Given a constant $z\ge 1$, the input to Robust $(k, z)$-Clustering is a set $P$ of $n$ weighted points in a metric space $(M,δ)$ and a positive integer $k$. Further, each point belongs to one (or more) of the $m$ many different groups $S_1,S_2,\ldots,S_m$. Our goal is to find a set $X$ of $k$ centers such that $\max_{i \in [m]} \sum_{p \in S_i} w(p) δ(p,X)^z$ is minimized. This problem arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness. For polynomial time computation, an approximation factor of $O(\log m/\log\log m)$ is known [Makarychev, Vakilian, COLT $2021$], which is tight under a plausible complexity assumption even in the line metrics. For FPT time, there is a $(3^z+ε)$-approximation algorithm, which is tight under GAP-ETH [Goyal, Jaiswal, Inf. Proc. Letters, 2023]. Motivated by the tight lower bounds for general discrete metrics, we focus on \emph{geometric} spaces such as the (discrete) high-dimensional Euclidean setting and metrics of low doubling dimension, which play an important role in data analysis applications. First, for a universal constant $η_0 >0.0006$, we devise a $3^z(1-η_{0})$-factor FPT approximation algorithm for discrete high-dimensional Euclidean spaces thereby bypassing the lower bound for general metrics. We complement this result by showing that even the special case of $k$-Center in dimension $Θ(\log n)$ is $(\sqrt{3/2}- o(1))$-hard to approximate for FPT algorithms. Finally, we complete the FPT approximation landscape by designing an FPT $(1+ε)$-approximation scheme (EPAS) for the metric of sub-logarithmic doubling dimension.
