Linear Programming based Approximation to Individually Fair k-Clustering with Outliers
Binita Maity, Shrutimoy Das, Anirban Dasgupta
TL;DR
The paper addresses individually fair $k$-clustering in the presence of outliers by formulating a linear program (LP) that identifies outliers and assigns inliers to centers while respecting a fairness constraint based on the fair radius $r(v)$, defined as the distance to the $\frac{n}{k}$-th nearest neighbor. The authors introduce IFXO, which solves the LP, applies an OutRound procedure to determine outliers, and then uses FairRound to compute fair centers for inliers, achieving a provable $12$-approximation for $k$-means and a $24$-approximation for $k$-median relative to the LP optimum, with a $2$-factor increase in the fairness parameter during rounding. They provide theoretical guarantees that the LP cost after rounding outliers is at most thrice the original LP cost, and empirical validation on real datasets demonstrates reductions in both clustering cost and the maximum fair radius compared to baselines, even when the data contains outliers. The work advances practical fair clustering by integrating outlier detection into the fairness-constrained objective and offers scalable rounding methods with provable guarantees, along with directions for future work on bounding detected outliers and improving scalability.
Abstract
Individual fairness guarantees are often desirable properties to have, but they become hard to formalize when the dataset contains outliers. Here, we investigate the problem of developing an individually fair $k$-means clustering algorithm for datasets that contain outliers. That is, given $n$ points and $k$ centers, we want that for each point which is not an outlier, there must be a center within the $\frac{n}{k}$ nearest neighbours of the given point. While a few of the recent works have looked into individually fair clustering, this is the first work that explores this problem in the presence of outliers for $k$-means clustering. For this purpose, we define and solve a linear program (LP) that helps us identify the outliers. We exclude these outliers from the dataset and apply a rounding algorithm that computes the $k$ centers, such that the fairness constraint of the remaining points is satisfied. We also provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost. We also demonstrate our techniques empirically on real-world datasets.
