Table of Contents
Fetching ...

Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms

Ameet Gadekar, Suhas Thejaswi

TL;DR

This work analyzes capacitated fair-range clustering, where the center set must satisfy both capacity and per-group fair-range constraints while minimizing $k$-median or $k$-means cost. It establishes strong inapproximability, proving that even when fair-range feasibility is easy to obtain, no non-trivial polynomial-time approximation exists unless $\mathsf{P}=\mathsf{NP}$, with stronger Gap-ETH-based bounds ruling out $n^{o(k)}$-time algorithms; these results hold even on tree metrics and for logarithmic numbers of groups. In the practically relevant regime of a constant number of groups, the paper provides two algorithmic tracks: (i) polynomial-time $O(\log k)$ (for CFR$k$Med) and $O(\log^2 k)$ (for CFR$k$Means) approximations achieved by tree-embedding and exact tree DP, and (ii) $\textsf{FPT}$-time approximations with factors $3+\varepsilon$ and $9+\varepsilon$ for CFR$k$Med and CFR$k$Means, respectively, using a leader-guessing framework, coresets, and reductions to disjoint-group subproblems. These results match the best-known guarantees for capacitated clustering without fair-range constraints and settle prior open questions in the area. Together, they delineate a sharp boundary between intractable general instances and tractable, practically relevant cases, guiding future fair clustering research and applications.

Abstract

Capacitated fair-range $k$-clustering generalizes classical $k$-clustering by incorporating both capacity constraints and demographic fairness. In this setting, each facility has a capacity limit and may belong to one or more demographic groups. The task is to select $k$ facilities as centers and assign each client to a center such that: ($a$) no center exceeds its capacity, ($b$) the number of centers selected from each group lies within specified lower and upper bounds (fair-range constraints), and ($c$) the clustering cost (e.g., $k$-median or $k$-means) is minimized. Prior work by Thejaswi et al. (KDD 2022) showed that satisfying fair-range constraints is NP-hard, making the problem inapproximable to any polynomial factor. We strengthen this result by showing that inapproximability persists even when the fair-range constraints are trivially satisfiable, highlighting the intrinsic computational complexity of the clustering task itself. Assuming standard complexity conjectures, we show that no non-trivial approximation is possible without exhaustively enumerating all $k$-subsets of the facility set. Notably, our inapproximability results hold even on tree metrics and when the number of groups is logarithmic in the size of the facility set. In light of these strong inapproximability results, we focus on a more practical setting where the number of groups is constant. In this regime, we design two approximation algorithms: ($i$) a polynomial-time $O(\log k)$- and $O(\log^2 k)$-approximation algorithm for the $k$-median and $k$-means objectives, and ($ii$) a fixed-parameter tractable algorithm parameterized by $k$, achieving $(3+ε)$- and $(9 + ε)$-approximation, respectively. These results match the best-known approximation guarantees for capacitated clustering without fair-range constraints and resolves an open question posed by Zang et al. (NeurIPS 2024).

Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms

TL;DR

This work analyzes capacitated fair-range clustering, where the center set must satisfy both capacity and per-group fair-range constraints while minimizing -median or -means cost. It establishes strong inapproximability, proving that even when fair-range feasibility is easy to obtain, no non-trivial polynomial-time approximation exists unless , with stronger Gap-ETH-based bounds ruling out -time algorithms; these results hold even on tree metrics and for logarithmic numbers of groups. In the practically relevant regime of a constant number of groups, the paper provides two algorithmic tracks: (i) polynomial-time (for CFRMed) and (for CFRMeans) approximations achieved by tree-embedding and exact tree DP, and (ii) -time approximations with factors and for CFRMed and CFRMeans, respectively, using a leader-guessing framework, coresets, and reductions to disjoint-group subproblems. These results match the best-known guarantees for capacitated clustering without fair-range constraints and settle prior open questions in the area. Together, they delineate a sharp boundary between intractable general instances and tractable, practically relevant cases, guiding future fair clustering research and applications.

Abstract

Capacitated fair-range -clustering generalizes classical -clustering by incorporating both capacity constraints and demographic fairness. In this setting, each facility has a capacity limit and may belong to one or more demographic groups. The task is to select facilities as centers and assign each client to a center such that: () no center exceeds its capacity, () the number of centers selected from each group lies within specified lower and upper bounds (fair-range constraints), and () the clustering cost (e.g., -median or -means) is minimized. Prior work by Thejaswi et al. (KDD 2022) showed that satisfying fair-range constraints is NP-hard, making the problem inapproximable to any polynomial factor. We strengthen this result by showing that inapproximability persists even when the fair-range constraints are trivially satisfiable, highlighting the intrinsic computational complexity of the clustering task itself. Assuming standard complexity conjectures, we show that no non-trivial approximation is possible without exhaustively enumerating all -subsets of the facility set. Notably, our inapproximability results hold even on tree metrics and when the number of groups is logarithmic in the size of the facility set. In light of these strong inapproximability results, we focus on a more practical setting where the number of groups is constant. In this regime, we design two approximation algorithms: () a polynomial-time - and -approximation algorithm for the -median and -means objectives, and () a fixed-parameter tractable algorithm parameterized by , achieving - and -approximation, respectively. These results match the best-known approximation guarantees for capacitated clustering without fair-range constraints and resolves an open question posed by Zang et al. (NeurIPS 2024).

Paper Structure

This paper contains 25 sections, 30 theorems, 20 equations, 5 figures.

Key Result

Theorem 4.1

There is no polynomial-time algorithm that can approximate $\textsc{FR$k$Med$^\pazocal{O}$}$ (or $\textsc{FR$k$Means$^\pazocal{O}$}$) to any polynomial factor, unless $\mathsf{P}\xspace = \mathsf{NP}\xspace$. The hardness holds even on tree metrics.

Figures (5)

  • Figure 1: Overview of our algorithm for Theorem \ref{['thm:polyapx']}: squares represent facilities, (gray) circles represent clients, and (black) hexagons are dummy nodes introduced in the tree embedding. Colors (red, blue and green) indicate facility groups. Panel (a) shows the original instance $\mathcal{I}$ of $\textsc{CFR$k$Med}$ in a general metric space $d$. Panel (b) depicts the transformed instance $\mathcal{I}'$ in $k$-clique-star $d'$ obtained from Lemma \ref{['lemma:cliquestaremb']} using an $\eta$-approximation solution $S$ treating $\pazocal{I}$ as a vanilla $k$-median instance; $S$ is highlighted with shaded area. Panel (c) illustrates the instance $\mathcal{I}"$ in the tree metric $d"$ obtained from Lemma \ref{['lemma:treemetric']} with the tree embedding of $S$ again highlighted with shaded area.
  • Figure 2: An illustration of the clique-star metric $d’$ for a client $c$. Facilities are represented as squares and clients as circles. The shaded area highlights the cluster centers $S$ selected by the $\eta$-approximation algorithm for vanilla $k$-median. In this example, client $c$ is assigned to facility $s_c \in S$, while in the optimal solution, it is served by facility $o_c$, which is connected to center $s_{o_c} \in S$ in $d'$. Our goal (Claim \ref{['cl:optind1']}) is to bound the rerouting cost $d’(c, o_c)$ in terms of distances in the original metric space $d$.
  • Figure 3: An illustration of the dynamic programming algorithm. Clients are represented as circles, facilities as squares, and internal (dummy) nodes as hexagons. The color of each facility (red, blue, green) indicates its group membership. In the dynamic programming step to compute $T(e, \vec{\kappa}\xspace, b)$, we find the minimum cost over all decompositions of the subtree solutions connected via edges $e^\ell$ and $e^r$, such that $\vec{\kappa}\xspace^\ell + \vec{\kappa}\xspace^r = \vec{\kappa}\xspace$ for all $\vec{\kappa}\xspace^\ell, \vec{\kappa}\xspace^r \in [k]^t$ and $b^\ell + b^r = b$. There is an additional cost for re-routing $|b|$ clients, which is $d"(e) \cdot |b|$, where $d"(e)$ is the distance of edge $e$ in the tree metric $d"$.
  • Figure 4: An illustration for bounding the approximation factor of the $\textsf{FPT}$ algorithm for $\textsc{OPG-WC$k$Med}^\emptyset$. Here, $\ell_i^*$ is the leader of cluster $i \in [k]$,$f_i$ is the facility closest to $\ell_i^*$ within the guessed radius, and $f_i^*$ is the facility that (partially) serves both the client $c$ in the optimal solution $S^*=\{f_i^*\}_{i \in [k]}$. If $\mu(c, f_i^*)$ denotes the fraction of $c$ served by $f_i^*$, we aim to bound the term $\mu(c, f_i^*) \cdot d(c, f_i)$ in terms of $\mu(c, f_i^*) \cdot d(c, f_i^*)$, the former corresponds to the cost incurred in our approximate solution while the latter is corresponds to the cost incurred in the optimal solution if $c$ were to be (partially) served by $f_i^*$.
  • Figure :

Theorems & Definitions (44)

  • Definition 3.1: The capacitated fair-range $k$-median (and $k$-means) problem
  • Theorem 4.1
  • Theorem 4.2: Informal version of Theorem \ref{['thm:hard:np2']}
  • Theorem 4.3
  • Remark 4.4
  • Theorem 5.1
  • Lemma 5.1
  • Lemma 5.1
  • Lemma 5.1
  • Theorem 5.2
  • ...and 34 more