Table of Contents
Fetching ...

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

Fateme Abbasi, Sandip Banerjee, Jarosław Byrka, Parinya Chalermsook, Ameet Gadekar, Kamyar Khodamoradi, Dániel Marx, Roohani Sharma, Joachim Spoerhase

TL;DR

This work advances parameterized approximations for Robust $(k,z)$-Clustering in geometric spaces by (i) surpassing general-metric barriers with a $3^z(1-\eta_0)$ FPT approximate algorithm in discrete high-dimensional Euclidean spaces, (ii) establishing a W[1]-hardness result for discrete $k$-Center in dimensions $\Theta(\log n)$, and (iii) delivering an EPAS for metrics with sub-logarithmic doubling dimension via coresets and ball-decomposition techniques. The core technical contributions include a strengthened assignment/projection lemma leveraging mid-point closure, a coreset framework for doubling metrics, and a careful hardness construction from Multi-Colored Independent Set. The results collectively map the FPT-approximation landscape for Robust $(k,z)$-Clustering in geometric spaces, showing both improved algorithmic prospects and sharp hardness boundaries, with practical impact for robust and fair clustering in high-dimensional data settings. Key mathematical features include the factor $3^z(1-\eta_0)$ in FPT time, the coreset size $(\tfrac{2^z}{\varepsilon})^{O(d)} kz \log n$, and the EPAS runtime $((\tfrac{2^z}{\varepsilon})^d k \log k)^{O(k)}$ in doubling metrics.

Abstract

We consider the well-studied Robust $(k, z)$-Clustering problem, which generalizes the classic $k$-Median, $k$-Means, and $k$-Center problems. Given a constant $z\ge 1$, the input to Robust $(k, z)$-Clustering is a set $P$ of $n$ weighted points in a metric space $(M,δ)$ and a positive integer $k$. Further, each point belongs to one (or more) of the $m$ many different groups $S_1,S_2,\ldots,S_m$. Our goal is to find a set $X$ of $k$ centers such that $\max_{i \in [m]} \sum_{p \in S_i} w(p) δ(p,X)^z$ is minimized. This problem arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness. For polynomial time computation, an approximation factor of $O(\log m/\log\log m)$ is known [Makarychev, Vakilian, COLT $2021$], which is tight under a plausible complexity assumption even in the line metrics. For FPT time, there is a $(3^z+ε)$-approximation algorithm, which is tight under GAP-ETH [Goyal, Jaiswal, Inf. Proc. Letters, 2023]. Motivated by the tight lower bounds for general discrete metrics, we focus on \emph{geometric} spaces such as the (discrete) high-dimensional Euclidean setting and metrics of low doubling dimension, which play an important role in data analysis applications. First, for a universal constant $η_0 >0.0006$, we devise a $3^z(1-η_{0})$-factor FPT approximation algorithm for discrete high-dimensional Euclidean spaces thereby bypassing the lower bound for general metrics. We complement this result by showing that even the special case of $k$-Center in dimension $Θ(\log n)$ is $(\sqrt{3/2}- o(1))$-hard to approximate for FPT algorithms. Finally, we complete the FPT approximation landscape by designing an FPT $(1+ε)$-approximation scheme (EPAS) for the metric of sub-logarithmic doubling dimension.

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

TL;DR

This work advances parameterized approximations for Robust -Clustering in geometric spaces by (i) surpassing general-metric barriers with a FPT approximate algorithm in discrete high-dimensional Euclidean spaces, (ii) establishing a W[1]-hardness result for discrete -Center in dimensions , and (iii) delivering an EPAS for metrics with sub-logarithmic doubling dimension via coresets and ball-decomposition techniques. The core technical contributions include a strengthened assignment/projection lemma leveraging mid-point closure, a coreset framework for doubling metrics, and a careful hardness construction from Multi-Colored Independent Set. The results collectively map the FPT-approximation landscape for Robust -Clustering in geometric spaces, showing both improved algorithmic prospects and sharp hardness boundaries, with practical impact for robust and fair clustering in high-dimensional data settings. Key mathematical features include the factor in FPT time, the coreset size , and the EPAS runtime in doubling metrics.

Abstract

We consider the well-studied Robust -Clustering problem, which generalizes the classic -Median, -Means, and -Center problems. Given a constant , the input to Robust -Clustering is a set of weighted points in a metric space and a positive integer . Further, each point belongs to one (or more) of the many different groups . Our goal is to find a set of centers such that is minimized. This problem arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness. For polynomial time computation, an approximation factor of is known [Makarychev, Vakilian, COLT ], which is tight under a plausible complexity assumption even in the line metrics. For FPT time, there is a -approximation algorithm, which is tight under GAP-ETH [Goyal, Jaiswal, Inf. Proc. Letters, 2023]. Motivated by the tight lower bounds for general discrete metrics, we focus on \emph{geometric} spaces such as the (discrete) high-dimensional Euclidean setting and metrics of low doubling dimension, which play an important role in data analysis applications. First, for a universal constant , we devise a -factor FPT approximation algorithm for discrete high-dimensional Euclidean spaces thereby bypassing the lower bound for general metrics. We complement this result by showing that even the special case of -Center in dimension is -hard to approximate for FPT algorithms. Finally, we complete the FPT approximation landscape by designing an FPT -approximation scheme (EPAS) for the metric of sub-logarithmic doubling dimension.
Paper Structure (10 sections, 16 theorems, 26 equations, 4 figures, 2 algorithms)

This paper contains 10 sections, 16 theorems, 26 equations, 4 figures, 2 algorithms.

Key Result

Theorem 1.1

There exists a universal constant $\eta_0 >0.0006$ such that for any constant positive integer $z$, there is a factor $3^z(1-\eta_0)$FPT approximation algorithm for Robust $(k,z)$-Clustering in discrete Euclidean space ${\mathbb R}^d$ that runs in time $2^{\mathcal{O}\left(k \log k\right)} \textsf{p

Figures (4)

  • Figure 1: This example shows that the projection lemma is tight even for the $1$-dimensional Euclidean space. Let $o=0$ be the optimum facility located at the origin and serving client $p=1/2$. Let $b'=1$ be the facility in $B$ that serves $p$ and let $b=\sigma(o)=-1$ be the facility in $B$ nearest to $o$. We have $\textsf{OPT}=1/2$, which also equals the cost of $B$. However $\delta(p,\sigma(o))=3/2=2\times \delta(p,o) + 1\times \delta(p,b')$. Combining multiple such examples in orthogonal directions and sharing facility $b$ shows that the approximation ratio of the algorithm of Goyal and Jaiswal goyal2021tight approaches $3$ in the discrete Euclidean space.
  • Figure 2: The midpoint of $b$ and $b"$ is shown by red dot, $||(b+b")/2 - o || \leq \frac{\alpha}{2}$ and thus $||\sigma(o)-o||\leq\alpha$.
  • Figure 3: The dashed black circle depicts ${\sf ball}(o,1)$, while the dashed gray circles represent ${\sf ball}(o, 1-\omega)$ and ${\sf ball}(o,1+\omega)$. Regions $R_1$, $R_2$, $R_3$, and $R_4$ are outlined with green, yellow, purple, and blue borders respectively.
  • Figure 5: An illustration of $D$ and the projection points of $p$, $q$ and $b'$.

Theorems & Definitions (30)

  • Theorem 1.1: High-Dimensional Euclidean Space
  • Theorem 1.2: Hardness in Discrete Euclidean Space
  • Theorem 1.3: EPAS for Doubling Metric of Sub-Logarithmic Dimension
  • Lemma 2.1: Projection Lemma
  • Theorem 3.1: High-Dimensional Euclidean Space
  • Lemma 3.1: Projection Lemma
  • proof
  • Lemma 3.1: Assignment Lemma
  • Definition 3.1: Displacement Ratio
  • Claim 3.1
  • ...and 20 more