Table of Contents
Fetching ...

Diversity-aware clustering: Computational Complexity and Approximation Algorithms

Suhas Thejaswi, Ameet Gadekar, Bruno Ordozgoiti, Aristides Gionis

TL;DR

The paper addresses diversity-aware clustering where centers must represent intersecting groups under per-group lower and upper bounds while minimizing $k$-Median, $k$-Means, or $k$-Supplier objectives. It establishes strong hardness results, including NP-hardness and inapproximability, and studies parameterized complexity, showing W[2]-hardness and SETH-based lower bounds. The authors develop $ extsf{FPT}$-time approximation frameworks for Div-$k$-Median and Div-$k$-Means with ratios $1+\frac{2}{e}+\epsilon$ and $1+\frac{8}{e}+\epsilon$, respectively, and a $5$-approximation for Div-$k$-Supplier, using coresets, constraint-pattern enumeration, and matroid-based reductions; these results are tight under Gap-ETH for the first two problems. They also adapt the findings to fair clustering with disjoint groups and discuss practical limitations, potential extensions to online fairness, and open problems in the domain. Overall, the work significantly advances algorithmic understanding of intersectional diversity constraints in clustering and provides concrete algorithms with provable guarantees under strong complexity assumptions.

Abstract

In this work, we study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution needs to ensure that the number of chosen cluster centers from each group should be within the range defined by a lower and upper bound threshold for each group, while simultaneously minimizing the clustering objective, which can be either $k$-median, $k$-means or $k$-supplier. We study the computational complexity of the proposed problems, offering insights into their NP-hardness, polynomial-time inapproximability, and fixed-parameter intractability. We present parameterized approximation algorithms with approximation ratios $1+ \frac{2}{e} + ε\approx 1.736$, $1+\frac{8}{e} + ε\approx 3.943$, and $5$ for diversity-aware $k$-median, diversity-aware $k$-means and diversity-aware $k$-supplier, respectively. Assuming Gap-ETH, the approximation ratios are tight for the diversity-aware $k$-median and diversity-aware $k$-means problems. Our results imply the same approximation factors for their respective fair variants with disjoint groups -- fair $k$-median, fair $k$-means, and fair $k$-supplier -- with lower bound requirements.

Diversity-aware clustering: Computational Complexity and Approximation Algorithms

TL;DR

The paper addresses diversity-aware clustering where centers must represent intersecting groups under per-group lower and upper bounds while minimizing -Median, -Means, or -Supplier objectives. It establishes strong hardness results, including NP-hardness and inapproximability, and studies parameterized complexity, showing W[2]-hardness and SETH-based lower bounds. The authors develop -time approximation frameworks for Div--Median and Div--Means with ratios and , respectively, and a -approximation for Div--Supplier, using coresets, constraint-pattern enumeration, and matroid-based reductions; these results are tight under Gap-ETH for the first two problems. They also adapt the findings to fair clustering with disjoint groups and discuss practical limitations, potential extensions to online fairness, and open problems in the domain. Overall, the work significantly advances algorithmic understanding of intersectional diversity constraints in clustering and provides concrete algorithms with provable guarantees under strong complexity assumptions.

Abstract

In this work, we study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution needs to ensure that the number of chosen cluster centers from each group should be within the range defined by a lower and upper bound threshold for each group, while simultaneously minimizing the clustering objective, which can be either -median, -means or -supplier. We study the computational complexity of the proposed problems, offering insights into their NP-hardness, polynomial-time inapproximability, and fixed-parameter intractability. We present parameterized approximation algorithms with approximation ratios , , and for diversity-aware -median, diversity-aware -means and diversity-aware -supplier, respectively. Assuming Gap-ETH, the approximation ratios are tight for the diversity-aware -median and diversity-aware -means problems. Our results imply the same approximation factors for their respective fair variants with disjoint groups -- fair -median, fair -means, and fair -supplier -- with lower bound requirements.
Paper Structure (16 sections, 19 theorems, 17 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 16 sections, 19 theorems, 17 equations, 1 figure, 1 table, 3 algorithms.

Key Result

Lemma 4.1

Problem $\text{\sc Div-(}{\vec{\alpha},\vec{\beta}}\text{\sc{)-Sat}}$ is $\mathsf{NP}$-hard.

Figures (1)

  • Figure 1: An illustration of facility selection for the $\textsf{FPT}$ algorithm for solving $k\text{\sc-Med-}k\text{\sc-PM}$ instance.

Theorems & Definitions (25)

  • Lemma 4.1
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • Proposition 4.1
  • Corollary 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Definition 1: Coreset
  • Theorem 5.1: Cohen-Addad et al. cohen2021new, Theorem 1
  • ...and 15 more