A Scalable Algorithm for Individually Fair K-means Clustering
MohammadHossein Bateni, Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi
TL;DR
This work tackles scalable, individually fair clustering under per-point radius constraints by introducing ConstrainedLocalSearch++, a fast local-search algorithm with seeding and anchor-zone mechanisms. It achieves a bicriteria $(O(1),6)$-approximation for the radii while attaining a constant-factor approximation on the $k$-means cost, running in $ ilde{O}(nd + nk^2)$ time. The algorithm is both theoretically grounded and empirically validated, demonstrating substantial speedups and often lower costs than previous methods on large real-world datasets. These results make individually fair clustering practical at scale and open avenues for tighter theoretical bounds and generalizations to broader objective functions.
Abstract
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $δ(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $δ(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in ~$O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
