Fair Clustering: Critique, Caveats, and Future Directions
John Dickerson, Seyed A. Esmaeili, Jamie Morgenstern, Claire Jie Zhang
TL;DR
This paper critiques the fair clustering literature by highlighting missing utility characterizations and potentially harmful downstream welfare effects. It contrasts OR facility-location and ML clustering perspectives, formalizes CM, SF, and EQ notions, and introduces the price of fairness as $PoF = \frac{\text{Cost of Optimal Solution Satisfying Constraint}}{\text{Cost of Optimal Agnostic Solution}}$. Through illustrative examples, it shows that enforcing fairness can increase distances or cause unequal welfare degradation across groups, and it discusses unintended ML pipeline consequences such as outlier detection and per-cluster modeling. It concludes with a path toward more impactful research, advocating welfare-centered formulations, realistic real-world data, long-term analyses, standards, and stakeholder engagement.
Abstract
Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature on fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering.
