Table of Contents
Fetching ...

A Polynomial-Time Approximation for Pairwise Fair $k$-Median Clustering

Sayan Bandyapadhyay, Eden Chlamtáč, Zachary Friggstad, Mahya Jamshidian, Yury Makarychev, Ali Vakilian

TL;DR

This work tackles Pairwise Fair $k$-Median clustering with $\\ell$ groups and a uniform balance parameter $t$, aiming to minimize the $k$-median cost while strictly satisfying fairness constraints. The authors present the first true polynomial-time $O(k^2 \cdot \ell \cdot t)$-approximation by leveraging centers from a standard $k$-median approximation, a linear-programming relaxation to obtain a near-fair assignment, and a careful rounding plus a controlled reassignment phase. They establish hardness results showing the problem is as hard as Soft Uniform Capacitated $k$-Median for disjoint groups (unless poly-time breakthroughs occur) and NP-hard to approximate within $n^{1-\varepsilon}$ for overlapping groups, via reductions from capacitated problems and hypergraph 2-colorability. Experiments on real datasets illustrate practical performance, revealing the cost of enforcing fairness grows with the number of clusters but remains favorable compared to the theoretical worst-case bounds, with runtime largely driven by the initial $k$-median step.

Abstract

In this work, we study pairwise fair clustering with $\ell \ge 2$ groups, where for every cluster $C$ and every group $i \in [\ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any other group $j \in [\ell]$, for a given integer $t$. To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when $\ell > 2$. In our work, focusing on the $\ell > 2$ case, we design the first polynomial-time $O(k^2\cdot \ell \cdot t)$-approximation for this problem with $k$-median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when $\ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no polynomial-time algorithm with an approximation factor of $o(\log k)$ is known.

A Polynomial-Time Approximation for Pairwise Fair $k$-Median Clustering

TL;DR

This work tackles Pairwise Fair -Median clustering with groups and a uniform balance parameter , aiming to minimize the -median cost while strictly satisfying fairness constraints. The authors present the first true polynomial-time -approximation by leveraging centers from a standard -median approximation, a linear-programming relaxation to obtain a near-fair assignment, and a careful rounding plus a controlled reassignment phase. They establish hardness results showing the problem is as hard as Soft Uniform Capacitated -Median for disjoint groups (unless poly-time breakthroughs occur) and NP-hard to approximate within for overlapping groups, via reductions from capacitated problems and hypergraph 2-colorability. Experiments on real datasets illustrate practical performance, revealing the cost of enforcing fairness grows with the number of clusters but remains favorable compared to the theoretical worst-case bounds, with runtime largely driven by the initial -median step.

Abstract

In this work, we study pairwise fair clustering with groups, where for every cluster and every group , the number of points in from group must be at most times the number of points in from any other group , for a given integer . To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when . In our work, focusing on the case, we design the first polynomial-time -approximation for this problem with -median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when is almost as hard as the popular uniform capacitated -median, for which no polynomial-time algorithm with an approximation factor of is known.
Paper Structure (15 sections, 8 theorems, 6 equations, 3 figures, 1 algorithm)

This paper contains 15 sections, 8 theorems, 6 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

There is a polynomial-time $O(k^2\cdot \ell \cdot t)$-approximation algorithm for Pairwise Fair $k$-Median.

Figures (3)

  • Figure 1: (a) A fair representational clustering for three groups red (r, disk), blue (b, square), and green (g, cross). Here $\alpha_r=1/3,\beta_r=2/3,\alpha_b=1/6,\beta_b=1/3,\alpha_g=1/6,\beta_g=1/3$. $C_1,C_2,C_3$ are the clusters. (b) A pairwise fair clustering with $t=2$ for the same dataset. The clustering in (a) is not pairwise fair for $t=2$, as $C_1$ contains 4 red (disk) points, but only 1 blue (square).
  • Figure 2: Cost comparison across multiple datasets with varying target numbers of clusters ($k$).
  • Figure 3: Runtime comparison across multiple datasets with varying numbers of clusters ($k$).

Theorems & Definitions (24)

  • Definition 1: Fair Representational Clustering
  • Definition 2: Pairwise Fair Clustering
  • Theorem 1
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • proof
  • Lemma 3
  • ...and 14 more