Table of Contents
Fetching ...

Proportional Fairness in Non-Centroid Clustering

Ioannis Caragiannis, Evi Micha, Nisarg Shah

TL;DR

This work revisits the recently developed framework of proportionally fair clustering, and designs a new algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and shows that the efficient GreedyCapture algorithm achieves a constant approximation of FJR.

Abstract

We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.

Proportional Fairness in Non-Centroid Clustering

TL;DR

This work revisits the recently developed framework of proportionally fair clustering, and designs a new algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and shows that the efficient GreedyCapture algorithm achieves a constant approximation of FJR.

Abstract

We revisit the recently developed framework of proportionally fair clustering, where the goal is to provide group fairness guarantees that become stronger for groups of data points (agents) that are large and cohesive. Prior work applies this framework to centroid clustering, where the loss of an agent is its distance to the centroid assigned to its cluster. We expand the framework to non-centroid clustering, where the loss of an agent is a function of the other agents in its cluster, by adapting two proportional fairness criteria -- the core and its relaxation, fully justified representation (FJR) -- to this setting. We show that the core can be approximated only under structured loss functions, and even then, the best approximation we are able to establish, using an adaptation of the GreedyCapture algorithm developed for centroid clustering [Chen et al., 2019; Micha and Shah, 2020], is unappealing for a natural loss function. In contrast, we design a new (inefficient) algorithm, GreedyCohesiveClustering, which achieves the relaxation FJR exactly under arbitrary loss functions, and show that the efficient GreedyCapture algorithm achieves a constant approximation of FJR. We also design an efficient auditing algorithm, which estimates the FJR approximation of any given clustering solution up to a constant factor. Our experiments on real data suggest that traditional clustering algorithms are highly unfair, whereas GreedyCapture is considerably fairer and incurs only a modest loss in common clustering objectives.

Paper Structure

This paper contains 33 sections, 12 theorems, 13 equations, 5 figures, 1 table, 2 algorithms.

Key Result

theorem 1

For arbitrary losses, there exists an instance in which no $\alpha$-core clustering exists for any finite $\alpha$.

Figures (5)

  • Figure 1: Census Income Dataset
  • Figure 2: The instance used to show the lower bounds in \ref{['thm:core-ub']} and \ref{['lem:gcsub-gccsub']}.
  • Figure 3: Diabetes dataset
  • Figure 4: Remaining figures for the Census Income Dataset
  • Figure 5: Iris dataset

Theorems & Definitions (28)

  • definition 1: $\alpha$-Core
  • theorem 1
  • theorem 2
  • theorem 3
  • definition 2: $\alpha$-Fully Justified Representation ($\alpha$-FJR)
  • proposition 1
  • proof
  • definition 3
  • theorem 4
  • proof
  • ...and 18 more