Table of Contents
Fetching ...

Dependent randomized rounding for clustering and partition systems with knapsack constraints

David G. Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

TL;DR

This work develops a robust framework for clustering under knapsack constraints with fairness considerations by introducing Knapsack-Partition Rounding (KPR), a dependent-rounding technique that preserves multiple knapsack budgets and a partition constraint while maintaining strong negative-correlation-like properties. The authors prove a Samuels–Feige–type concentration bound for sums of unbounded, negatively associated variables, enabling additive pseudo-approximation guarantees that complement exact knapsack feasibility. They apply KPR to obtain new pseudo-approximation results for knapsack median and knapsack center, including single- and multi-knapsack settings, with additive and multiplicative guarantees and near-fair distance behavior. The methods yield improved theoretical guarantees and practical implications for fair clustering and facility location, offering a toolkit to balance efficiency, constraint satisfaction, and equitable representation.

Abstract

Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds can be useful in inferring properties of large networks using few samples.

Dependent randomized rounding for clustering and partition systems with knapsack constraints

TL;DR

This work develops a robust framework for clustering under knapsack constraints with fairness considerations by introducing Knapsack-Partition Rounding (KPR), a dependent-rounding technique that preserves multiple knapsack budgets and a partition constraint while maintaining strong negative-correlation-like properties. The authors prove a Samuels–Feige–type concentration bound for sums of unbounded, negatively associated variables, enabling additive pseudo-approximation guarantees that complement exact knapsack feasibility. They apply KPR to obtain new pseudo-approximation results for knapsack median and knapsack center, including single- and multi-knapsack settings, with additive and multiplicative guarantees and near-fair distance behavior. The methods yield improved theoretical guarantees and practical implications for fair clustering and facility location, offering a toolkit to balance efficiency, constraint satisfaction, and equitable representation.

Abstract

Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds can be useful in inferring properties of large networks using few samples.

Paper Structure

This paper contains 30 sections, 48 theorems, 109 equations, 8 algorithms.

Key Result

Theorem 1.2

Let $\gamma, \epsilon \in (0,1)$. For single-knapsack median, there is a polynomial-time algorithm to obtain an $O(1/\gamma)$-additive pseudo-solution $\mathcal{S}$ with $\text{cost}(\mathcal{S}) \leq (1+\sqrt{3}+\gamma) \cdot \text{OPT} \leq 2.733 \cdot \text{OPT}$ and an algorithm with $n^{O(\epsi

Theorems & Definitions (88)

  • Definition 1.1: $q$-additive pseudo-approximation
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Theorem 1.5
  • Definition 2.1: Negatively associated random variables na-cite
  • Theorem 2.2
  • Corollary 2.3
  • Proposition 2.4
  • Theorem 3.1
  • ...and 78 more