Relax and Merge: A Simple Yet Effective Framework for Solving Fair $k$-Means and $k$-sparse Wasserstein Barycenter Problems
Shihong Song, Guanlin Mo, Qingyuan Yang, Hu Ding
TL;DR
The paper addresses fair clustering under $(\alpha,\beta)$-fair constraints for $k$-means and explores the related $k$-sparse Wasserstein Barycenter problem in Euclidean space. It introduces the Relax and Merge framework, which leverages an $\epsilon$-approximate centroid set to construct a relaxed candidate center set, solves a fair LP to obtain a fractional assignment, and then merges via a vanilla $k$-means step to obtain a final center set with strong approximation guarantees. The key contributions are: (i) a fractional $(1+4\rho+O(\epsilon))$-approximation for fair $k$-means and $k$-WB (with a $(5+O(\epsilon))$-approximation under a PTAS for vanilla $k$-means), (ii) a $(2+6\rho)$-approximation for strictly fair no-violation $k$-means, and (iii) comprehensive experiments showing substantial improvements over baselines. These results advance both the theoretical guarantees and practical performance for fair clustering and transport-based barycenter problems in low-dimensional spaces.
Abstract
The fairness of clustering algorithms has gained widespread attention across various areas, including machine learning, In this paper, we study fair $k$-means clustering in Euclidean space. Given a dataset comprising several groups, the fairness constraint requires that each cluster should contain a proportion of points from each group within specified lower and upper bounds. Due to these fairness constraints, determining the optimal locations of $k$ centers is a quite challenging task. We propose a novel ``Relax and Merge'' framework that returns a $(1+4ρ+ O(ε))$-approximate solution, where $ρ$ is the approximate ratio of an off-the-shelf vanilla $k$-means algorithm and $O(ε)$ can be an arbitrarily small positive number. If equipped with a PTAS of $k$-means, our solution can achieve an approximation ratio of $(5+O(ε))$ with only a slight violation of the fairness constraints, which improves the current state-of-the-art approximation guarantee. Furthermore, using our framework, we can also obtain a $(1+4ρ+O(ε))$-approximate solution for the $k$-sparse Wasserstein Barycenter problem, which is a fundamental optimization problem in the field of optimal transport, and a $(2+6ρ)$-approximate solution for the strictly fair $k$-means clustering with no violation, both of which are better than the current state-of-the-art methods. In addition, the empirical results demonstrate that our proposed algorithm can significantly outperform baseline approaches in terms of clustering cost.
