Capacitated Fair-Range Clustering: Hardness and Approximation Algorithms
Ameet Gadekar, Suhas Thejaswi
TL;DR
This work analyzes capacitated fair-range clustering, where the center set must satisfy both capacity and per-group fair-range constraints while minimizing $k$-median or $k$-means cost. It establishes strong inapproximability, proving that even when fair-range feasibility is easy to obtain, no non-trivial polynomial-time approximation exists unless $\mathsf{P}=\mathsf{NP}$, with stronger Gap-ETH-based bounds ruling out $n^{o(k)}$-time algorithms; these results hold even on tree metrics and for logarithmic numbers of groups. In the practically relevant regime of a constant number of groups, the paper provides two algorithmic tracks: (i) polynomial-time $O(\log k)$ (for CFR$k$Med) and $O(\log^2 k)$ (for CFR$k$Means) approximations achieved by tree-embedding and exact tree DP, and (ii) $\textsf{FPT}$-time approximations with factors $3+\varepsilon$ and $9+\varepsilon$ for CFR$k$Med and CFR$k$Means, respectively, using a leader-guessing framework, coresets, and reductions to disjoint-group subproblems. These results match the best-known guarantees for capacitated clustering without fair-range constraints and settle prior open questions in the area. Together, they delineate a sharp boundary between intractable general instances and tractable, practically relevant cases, guiding future fair clustering research and applications.
Abstract
Capacitated fair-range $k$-clustering generalizes classical $k$-clustering by incorporating both capacity constraints and demographic fairness. In this setting, each facility has a capacity limit and may belong to one or more demographic groups. The task is to select $k$ facilities as centers and assign each client to a center such that: ($a$) no center exceeds its capacity, ($b$) the number of centers selected from each group lies within specified lower and upper bounds (fair-range constraints), and ($c$) the clustering cost (e.g., $k$-median or $k$-means) is minimized. Prior work by Thejaswi et al. (KDD 2022) showed that satisfying fair-range constraints is NP-hard, making the problem inapproximable to any polynomial factor. We strengthen this result by showing that inapproximability persists even when the fair-range constraints are trivially satisfiable, highlighting the intrinsic computational complexity of the clustering task itself. Assuming standard complexity conjectures, we show that no non-trivial approximation is possible without exhaustively enumerating all $k$-subsets of the facility set. Notably, our inapproximability results hold even on tree metrics and when the number of groups is logarithmic in the size of the facility set. In light of these strong inapproximability results, we focus on a more practical setting where the number of groups is constant. In this regime, we design two approximation algorithms: ($i$) a polynomial-time $O(\log k)$- and $O(\log^2 k)$-approximation algorithm for the $k$-median and $k$-means objectives, and ($ii$) a fixed-parameter tractable algorithm parameterized by $k$, achieving $(3+ε)$- and $(9 + ε)$-approximation, respectively. These results match the best-known approximation guarantees for capacitated clustering without fair-range constraints and resolves an open question posed by Zang et al. (NeurIPS 2024).
