Table of Contents
Fetching ...

Proportionally Representative Clustering

Haris Aziz, Barton E. Lee, Sean Morota Chu, Jeremy Vollen

TL;DR

This work proposes a new axiom ``proportionally representative fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together.

Abstract

In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on centroid clustering--one of the fundamental tasks in unsupervised machine learning. We propose a new axiom ``proportionally representative fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems. Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom. Our algorithm for the discrete setting also matches the best known approximation factor for PF.

Proportionally Representative Clustering

TL;DR

This work proposes a new axiom ``proportionally representative fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together.

Abstract

In recent years, there has been a surge in effort to formalize notions of fairness in machine learning. We focus on centroid clustering--one of the fundamental tasks in unsupervised machine learning. We propose a new axiom ``proportionally representative fairness'' (PRF) that is designed for clustering problems where the selection of centroids reflects the distribution of data points and how tightly they are clustered together. Our fairness concept is not satisfied by existing fair clustering algorithms. We design efficient algorithms to achieve PRF both for unconstrained and discrete clustering problems. Our algorithm for the unconstrained setting is also the first known polynomial-time approximation algorithm for the well-studied Proportional Fairness (PF) axiom. Our algorithm for the discrete setting also matches the best known approximation factor for PF.
Paper Structure (15 sections, 9 theorems, 4 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 9 theorems, 4 equations, 9 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

For metric spaces, if an outcome $X\subseteq {\mathcal{M}}\xspace$ with $|X|=k$ satisfies PRF for Unconstrained Clustering, then $X$ is $\frac{3+\sqrt{17}}{2}$-approximate PF, and there exists an instance for which this bound is tight.

Figures (9)

  • Figure 1: An example instance with 6 agents and $k=3$ for which a PF outcome does not exist.
  • Figure 2: Some of the requirements of PRF for instance in Example \ref{['example:PFnott']}.
  • Figure 3: The k-means solution may not satisfy PRF.
  • Figure F.4: Buddy dataset: MSD to closest 1, k/2, and k centroids.
  • Figure F.5: Seeds dataset: MSD to closest 1, k/2, and k centroids.
  • ...and 4 more figures

Theorems & Definitions (31)

  • Definition 1: Proportional Fairness CFLM19a
  • Definition 2: Unanimous Proportionality (UP)
  • Example 1: Proportional fairness and core fairness
  • Definition 3: $\rho$-approximate Proportional Fairness
  • Example 2
  • Definition 4: Proportionally Representative Fairness (PRF) for Unconstrained Clustering
  • Example 3: Requirements of PRF
  • Proposition 1
  • proof
  • Example 4: $k$-means does not satisfy PRF
  • ...and 21 more