Table of Contents
Fetching ...

Fair Center Clustering in Sliding Windows

Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Francesco Visonà

TL;DR

The paper tackles fair center clustering in the sliding-window streaming model, introducing the first space-efficient algorithm for general metrics that respects color-based capacity constraints. It builds a coreset-based framework with a grid of radius guesses and maintains validation and coreset point sets to ensure fairness while bounding memory; the Query step invokes a strong sequential fair-center algorithm on the coreset to achieve an $(\alpha+\varepsilon)$-approximation, with space $O\left(k^2 \frac{\log \Delta}{\log(1+\beta)} (c/\varepsilon)^{D_{W_t}}\right)$ and update/query times polynomial in $m$ and $k$, independent of window size. A constant-factor variant removes the dependence on the doubling dimension at the cost of a constant-factor degradation in approximation, both backed by rigorous analysis. Experiments on real and synthetic data validate the approach, showing substantial memory and speed advantages over full-window baselines while maintaining competitive solution quality, and demonstrating robustness to window length and data dimensionality. The work advances practical fair clustering for streaming data and opens avenues for robust fair center extensions and broader matroid-constrained clustering in sliding windows.

Abstract

The $k$-center problem requires the selection of $k$ points (centers) from a given metric pointset $W$ so to minimize the maximum distance of any point of $W$ from the closest center. This paper focuses on a fair variant of the problem, known as \emph {fair center}, where each input point belongs to some category and each category may contribute a limited number of points to the center set. We present the first space-efficient streaming algorithm for fair center in general metrics, under the sliding window model. At any time $t$, the algorithm is able to provide a solution for the current window whose quality is almost as good as the one guaranteed by the best, polynomial-time sequential algorithms run on the entire window, and exhibits space and time requirements independent of the window size. Our theoretical results are backed by an extensive set of experiments on both real-world and synthetic datasets, which provide evidence of the practical viability of the algorithm.

Fair Center Clustering in Sliding Windows

TL;DR

The paper tackles fair center clustering in the sliding-window streaming model, introducing the first space-efficient algorithm for general metrics that respects color-based capacity constraints. It builds a coreset-based framework with a grid of radius guesses and maintains validation and coreset point sets to ensure fairness while bounding memory; the Query step invokes a strong sequential fair-center algorithm on the coreset to achieve an -approximation, with space and update/query times polynomial in and , independent of window size. A constant-factor variant removes the dependence on the doubling dimension at the cost of a constant-factor degradation in approximation, both backed by rigorous analysis. Experiments on real and synthetic data validate the approach, showing substantial memory and speed advantages over full-window baselines while maintaining competitive solution quality, and demonstrating robustness to window length and data dimensionality. The work advances practical fair clustering for streaming data and opens avenues for robust fair center extensions and broader matroid-constrained clustering in sliding windows.

Abstract

The -center problem requires the selection of points (centers) from a given metric pointset so to minimize the maximum distance of any point of from the closest center. This paper focuses on a fair variant of the problem, known as \emph {fair center}, where each input point belongs to some category and each category may contribute a limited number of points to the center set. We present the first space-efficient streaming algorithm for fair center in general metrics, under the sliding window model. At any time , the algorithm is able to provide a solution for the current window whose quality is almost as good as the one guaranteed by the best, polynomial-time sequential algorithms run on the entire window, and exhibits space and time requirements independent of the window size. Our theoretical results are backed by an extensive set of experiments on both real-world and synthetic datasets, which provide evidence of the practical viability of the algorithm.

Paper Structure

This paper contains 21 sections, 8 theorems, 8 equations, 5 figures, 3 algorithms.

Key Result

Lemma 1

For every $\gamma \in \Gamma$ and $t>0$, the following properties hold after the execution of the Update procedure for the point arrived at time $t$:

Figures (5)

  • Figure 1: Approximation ratio (top) and memory (bottom, in number of points) for varying $\delta$.
  • Figure 2: Running time in milliseconds of update (top) and query (bottom) for varying $\delta$. The scale of query time is logarithmic.
  • Figure 3: Memory (top, in number of points) and running time of query (bottom, in milliseconds) at varying window sizes. The scale of query time is logarithmic.
  • Figure 4: Query time (left, in milliseconds) and memory (right, in number of points) with respect to the dimensionality, on the blobs datasets.
  • Figure 5: Query time and memory (resp. left and right) with respect to the dimensionality on the rotated datasets. The scale of query time is logarithmic.

Theorems & Definitions (13)

  • Lemma 1
  • proof
  • Lemma 2
  • Lemma 3
  • proof
  • Theorem 1
  • proof
  • Corollary 1
  • Theorem 2
  • proof
  • ...and 3 more