A Polynomial-Time Approximation for Pairwise Fair $k$-Median Clustering
Sayan Bandyapadhyay, Eden Chlamtáč, Zachary Friggstad, Mahya Jamshidian, Yury Makarychev, Ali Vakilian
TL;DR
This work tackles Pairwise Fair $k$-Median clustering with $\\ell$ groups and a uniform balance parameter $t$, aiming to minimize the $k$-median cost while strictly satisfying fairness constraints. The authors present the first true polynomial-time $O(k^2 \cdot \ell \cdot t)$-approximation by leveraging centers from a standard $k$-median approximation, a linear-programming relaxation to obtain a near-fair assignment, and a careful rounding plus a controlled reassignment phase. They establish hardness results showing the problem is as hard as Soft Uniform Capacitated $k$-Median for disjoint groups (unless poly-time breakthroughs occur) and NP-hard to approximate within $n^{1-\varepsilon}$ for overlapping groups, via reductions from capacitated problems and hypergraph 2-colorability. Experiments on real datasets illustrate practical performance, revealing the cost of enforcing fairness grows with the number of clusters but remains favorable compared to the theoretical worst-case bounds, with runtime largely driven by the initial $k$-median step.
Abstract
In this work, we study pairwise fair clustering with $\ell \ge 2$ groups, where for every cluster $C$ and every group $i \in [\ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any other group $j \in [\ell]$, for a given integer $t$. To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when $\ell > 2$. In our work, focusing on the $\ell > 2$ case, we design the first polynomial-time $O(k^2\cdot \ell \cdot t)$-approximation for this problem with $k$-median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when $\ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no polynomial-time algorithm with an approximation factor of $o(\log k)$ is known.
