Table of Contents
Fetching ...

FPT Approximations for Fair $k$-Min-Sum-Radii

Lena Carta, Lukas Drexler, Annika Hennes, Clemens Rösner, Melanie Schmidt

TL;DR

An FPT $(6+\epsilon)-approximation that works for $k$-MSR under the mentioned general fairness notion is designed and improved to achieve a $(3+\epsilon)$-approximation.

Abstract

We consider the $k$-min-sum-radii ($k$-MSR) clustering problem with fairness constraints. The $k$-min-sum-radii problem is a mixture of the classical $k$-center and $k$-median problems. We are given a set of points $P$ in a metric space and a number $k$ and aim to partition the points into $k$ clusters, each of the clusters having one designated center. The objective to minimize is the sum of the radii of the $k$ clusters (where in $k$-center we would only consider the maximum radius and in $k$-median we would consider the sum of the individual points' costs). Various notions of fair clustering have been introduced lately, and we follow the definitions due to Chierichetti, Kumar, Lattanzi and Vassilvitskii [NeurIPS 2017] which demand that cluster compositions shall follow the proportions of the input point set with respect to some given sensitive attribute. For the easier case where the sensitive attribute only has two possible values and each is equally frequent in the input, the aim is to compute a clustering where all clusters have a 1:1 ratio with respect to this attribute. We call this the 1:1 case. There has been a surge of FPT-approximation algorithms for the $k$-MSR problem lately, solving the problem both in the unconstrained case and in several constrained problem variants. We add to this research area by designing an FPT $(6+ε)$-approximation that works for $k$-MSR under the mentioned general fairness notion. For the special 1:1 case, we improve our algorithm to achieve a $(3+ε)$-approximation.

FPT Approximations for Fair $k$-Min-Sum-Radii

TL;DR

An FPT k(3+\epsilon)$-approximation.

Abstract

We consider the -min-sum-radii (-MSR) clustering problem with fairness constraints. The -min-sum-radii problem is a mixture of the classical -center and -median problems. We are given a set of points in a metric space and a number and aim to partition the points into clusters, each of the clusters having one designated center. The objective to minimize is the sum of the radii of the clusters (where in -center we would only consider the maximum radius and in -median we would consider the sum of the individual points' costs). Various notions of fair clustering have been introduced lately, and we follow the definitions due to Chierichetti, Kumar, Lattanzi and Vassilvitskii [NeurIPS 2017] which demand that cluster compositions shall follow the proportions of the input point set with respect to some given sensitive attribute. For the easier case where the sensitive attribute only has two possible values and each is equally frequent in the input, the aim is to compute a clustering where all clusters have a 1:1 ratio with respect to this attribute. We call this the 1:1 case. There has been a surge of FPT-approximation algorithms for the -MSR problem lately, solving the problem both in the unconstrained case and in several constrained problem variants. We add to this research area by designing an FPT -approximation that works for -MSR under the mentioned general fairness notion. For the special 1:1 case, we improve our algorithm to achieve a -approximation.
Paper Structure (12 sections, 15 theorems, 8 equations, 5 figures, 1 table, 4 algorithms)

This paper contains 12 sections, 15 theorems, 8 equations, 5 figures, 1 table, 4 algorithms.

Key Result

Lemma 3

Running Algorithm alg:gonzalez with $d'$ and $c_1, \ldots, c_{\ell}$ already fixed yields a $2$-approximation for the $k$-center completion problem with input $(P,d,k, c_1, \ldots, c_{\ell},r_1,\ldots,r_{\ell}$).

Figures (5)

  • Figure 1: Anything can happen for $k$-MSR: The cost of a micro clustering with $\mu=4$ compared to the macro clustering with $k=1$. All pictures use shortest path metrics, the edges have unit weights.
  • Figure 2: An instance of a 3-center completion problem. The centers $\hat{c}_1$ and $\hat{c}_2$ with corresponding radii $\hat{r}_1 = 1$ and $\hat{r}_2 = 0.5$ are already given. The underlying distances are given by $d(p,\hat{c}_1) = 1.5$, $d(p,\hat{c}_2) = 1$, $d(p,\bar{c}_3)=\sqrt{2}$. In $d'$, all distances to one of the centers $\hat{c}_1,\hat{c}_2$ are shortened by the respective radius $\hat{r}_1,\hat{r}_2$. Dotted parts indicate the segments that do not contribute to the distance $d'$. For example, $d'(p,\hat{c}_1) = 0.5$, $d'(p,\hat{c}_2) = 0.5$. However, distances not involving $\hat{c}_1$ or $\hat{c}_2$ as one of the end points stay the same, i.e., $d'(p,\bar{c}_3)=d(p,\bar{c}_3)$. Originally, the point $p$ is closer to $\bar{c}_3$ than to $\hat{c}_1$. But under $d'$, $p$ is closer to $\hat{c}_1$. This example also shows that the distance $d'$ does not fulfill the triangle inequality: While $d'(p,\bar{c}_3)=\sqrt{2}$, the detour via $\hat{c}_2$ is shorter: $d'(p,\hat{c}_2) + d(\hat{c}_2,\bar{c}_3) = 1$.
  • Figure 3: An instance of a $k$-min-sum-radii problem with exact fairness constraint with two colors and a blue:orange ratio of 2:1. The larger dots indicate centers and the gray lines indicate the radii output by Alg. \ref{['alg:centers-and-radii']}. The black circles show the induced balls $B(\hat{c}_i, \hat{r}_i)$. The black lines between points represent the edges of the induced access graph. Note that the balls themselves are not necessarily fair, but every connected component is.
  • Figure 4: A fair $k$-min-sum-radii instance with two colors and equal proportions. Left: Output of \ref{['alg:centers-and-radii']} as balls $B(\hat{c}_i,\hat{r}_i)$ for $i=1,2,3$ and the graph edges between any fair pair of points that have access to the same center $\hat{c}$. Right: The corresponding flow network. All edges have capacity 1.
  • Figure 5: An example run of \ref{['alg:centers-and-radii']}. Given a set of points of which one-third are blue and two-thirds are orange. The green circles indicate the optimal clusters with centers $c_1^*,c_2^*,c_3^*$. Black squares indicate the centers $\{\hat{c}_1,\ldots,\hat{c}_i,\bar{c}_{i+1},\ldots,\bar{c}_k\}$ of the $k$-center completion solution output by \ref{['alg:gonzalez']} on given centers $\{\hat{c}_1,\ldots,\hat{c}_i\}$. The arrows represent the guess of the assignment of the next optimal center. Here, we assume that the algorithm guesses correctly.

Theorems & Definitions (18)

  • Definition 1
  • Definition 2
  • Lemma 3
  • Corollary 3
  • Theorem 4
  • Definition 5: Guessing correctly
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • ...and 8 more