Table of Contents
Fetching ...

Linear Programming based Approximation to Individually Fair k-Clustering with Outliers

Binita Maity, Shrutimoy Das, Anirban Dasgupta

TL;DR

The paper addresses individually fair $k$-clustering in the presence of outliers by formulating a linear program (LP) that identifies outliers and assigns inliers to centers while respecting a fairness constraint based on the fair radius $r(v)$, defined as the distance to the $\frac{n}{k}$-th nearest neighbor. The authors introduce IFXO, which solves the LP, applies an OutRound procedure to determine outliers, and then uses FairRound to compute fair centers for inliers, achieving a provable $12$-approximation for $k$-means and a $24$-approximation for $k$-median relative to the LP optimum, with a $2$-factor increase in the fairness parameter during rounding. They provide theoretical guarantees that the LP cost after rounding outliers is at most thrice the original LP cost, and empirical validation on real datasets demonstrates reductions in both clustering cost and the maximum fair radius compared to baselines, even when the data contains outliers. The work advances practical fair clustering by integrating outlier detection into the fairness-constrained objective and offers scalable rounding methods with provable guarantees, along with directions for future work on bounding detected outliers and improving scalability.

Abstract

Individual fairness guarantees are often desirable properties to have, but they become hard to formalize when the dataset contains outliers. Here, we investigate the problem of developing an individually fair $k$-means clustering algorithm for datasets that contain outliers. That is, given $n$ points and $k$ centers, we want that for each point which is not an outlier, there must be a center within the $\frac{n}{k}$ nearest neighbours of the given point. While a few of the recent works have looked into individually fair clustering, this is the first work that explores this problem in the presence of outliers for $k$-means clustering. For this purpose, we define and solve a linear program (LP) that helps us identify the outliers. We exclude these outliers from the dataset and apply a rounding algorithm that computes the $k$ centers, such that the fairness constraint of the remaining points is satisfied. We also provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost. We also demonstrate our techniques empirically on real-world datasets.

Linear Programming based Approximation to Individually Fair k-Clustering with Outliers

TL;DR

The paper addresses individually fair $k$-clustering in the presence of outliers by formulating a linear program (LP) that identifies outliers and assigns inliers to centers while respecting a fairness constraint based on the fair radius $r(v)$, defined as the distance to the $\frac{n}{k}$-th nearest neighbor. The authors introduce IFXO, which solves the LP, applies an OutRound procedure to determine outliers, and then uses FairRound to compute fair centers for inliers, achieving a provable $12$-approximation for $k$-means and a $24$-approximation for $k$-median relative to the LP optimum, with a $2$-factor increase in the fairness parameter during rounding. They provide theoretical guarantees that the LP cost after rounding outliers is at most thrice the original LP cost, and empirical validation on real datasets demonstrates reductions in both clustering cost and the maximum fair radius compared to baselines, even when the data contains outliers. The work advances practical fair clustering by integrating outlier detection into the fairness-constrained objective and offers scalable rounding methods with provable guarantees, along with directions for future work on bounding detected outliers and improving scalability.

Abstract

Individual fairness guarantees are often desirable properties to have, but they become hard to formalize when the dataset contains outliers. Here, we investigate the problem of developing an individually fair -means clustering algorithm for datasets that contain outliers. That is, given points and centers, we want that for each point which is not an outlier, there must be a center within the nearest neighbours of the given point. While a few of the recent works have looked into individually fair clustering, this is the first work that explores this problem in the presence of outliers for -means clustering. For this purpose, we define and solve a linear program (LP) that helps us identify the outliers. We exclude these outliers from the dataset and apply a rounding algorithm that computes the centers, such that the fairness constraint of the remaining points is satisfied. We also provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost. We also demonstrate our techniques empirically on real-world datasets.

Paper Structure

This paper contains 10 sections, 2 theorems, 11 equations, 2 figures, 4 tables, 2 algorithms.

Key Result

theorem thmcountertheorem

Suppose the optimal cost for lp is $LP_{\alpha=1}(x^*, y^*, z^*).$ Running Algorithm alg:outround with $(x^*, y^*, z^*)$ as inputs and threshold $\tau = 0,$ results in the cost $LP_{\alpha=2}(x',y',z'),$ where $z' = \{\mathbf{1}[z^{*}[v] > \tau] : \forall v \in X \}.$ Then, Consequently, since the FairRound algorithm in negahbani2021better gives a $4$ approximate solution for individually fair $k

Figures (2)

  • Figure 1: Individually fair $2$-clustering in the presence of an outlier.
  • Figure 2: Maximum Fairness Radius for the Bank dataset for different number of clusters.

Theorems & Definitions (7)

  • definition thmcounterdefinition: Fair radius $r(\cdot)$.
  • definition thmcounterdefinition: Fair $(p,k)$-clustering
  • definition thmcounterdefinition: $(\alpha, k, m)$- fair clustering excluding outliers
  • theorem thmcountertheorem
  • lemma thmcounterlemma
  • proof
  • proof