Table of Contents
Fetching ...

Fair Clustering in the Sliding Window Model

Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Qiaoyuan Yang, Yubo Zhang, Samson Zhou

TL;DR

The paper investigates fair clustering in the sliding window streaming model and reveals a sharp separation: any algorithm attaining a finite multiplicative or additive fair-clustering guarantee in this model requires linear space, unlike insertion-only streams. It then shows a positive result under a relaxed fairness constraint: a $(1+\varepsilon)$-approximation with slack can be achieved in sublinear space using online assignment-preserving coresets and a merge-and-reduce framework, producing a $(1-\varepsilon)\alpha, (1+\varepsilon)\beta$-fair clustering whose cost is within a factor $(1+\varepsilon)$ of the optimum on the sliding window, with a robust coreset guarantee. The approach combines assignment-preserving coresets (via RingCoreset and Meyerson sketch) with a streaming reduction to obtain sublinear-space guarantees and a practical pipeline for real data, complemented by empirical evaluations on multiple datasets. This work highlights fundamental limits of fairness under recency constraints and offers a viable, sublinear-space method when a small fairness slack is acceptable, with practical implications for real-time, fair clustering in dynamic data streams.

Abstract

We study streaming algorithms for proportionally fair clustering, a notion originally suggested by Chierichetti et. al. (2017), in the sliding window model. We show that although there exist efficient streaming algorithms in the insertion-only model, surprisingly no algorithm can achieve finite multiplicative ratio without violating the fairness constraint in the sliding window. Hence, the problem of fair clustering is a rare separation between the insertion-only streaming model and the sliding window model. On the other hand, we show that if the fairness constraint is relaxed by a multiplicative $(1+\varepsilon)$ factor, there exists a $(1 + \varepsilon)$-approximate sliding window algorithm that uses $\text{poly}(k\varepsilon^{-1}\log n)$ space. This achieves essentially the best parameters (up to degree in the polynomial) provided the aforementioned lower bound. We also implement a number of empirical evaluations on real datasets to complement our theoretical results.

Fair Clustering in the Sliding Window Model

TL;DR

The paper investigates fair clustering in the sliding window streaming model and reveals a sharp separation: any algorithm attaining a finite multiplicative or additive fair-clustering guarantee in this model requires linear space, unlike insertion-only streams. It then shows a positive result under a relaxed fairness constraint: a -approximation with slack can be achieved in sublinear space using online assignment-preserving coresets and a merge-and-reduce framework, producing a -fair clustering whose cost is within a factor of the optimum on the sliding window, with a robust coreset guarantee. The approach combines assignment-preserving coresets (via RingCoreset and Meyerson sketch) with a streaming reduction to obtain sublinear-space guarantees and a practical pipeline for real data, complemented by empirical evaluations on multiple datasets. This work highlights fundamental limits of fairness under recency constraints and offers a viable, sublinear-space method when a small fairness slack is acceptable, with practical implications for real-time, fair clustering in dynamic data streams.

Abstract

We study streaming algorithms for proportionally fair clustering, a notion originally suggested by Chierichetti et. al. (2017), in the sliding window model. We show that although there exist efficient streaming algorithms in the insertion-only model, surprisingly no algorithm can achieve finite multiplicative ratio without violating the fairness constraint in the sliding window. Hence, the problem of fair clustering is a rare separation between the insertion-only streaming model and the sliding window model. On the other hand, we show that if the fairness constraint is relaxed by a multiplicative factor, there exists a -approximate sliding window algorithm that uses space. This achieves essentially the best parameters (up to degree in the polynomial) provided the aforementioned lower bound. We also implement a number of empirical evaluations on real datasets to complement our theoretical results.

Paper Structure

This paper contains 13 sections, 24 theorems, 56 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Theorem 1.1

Any algorithm for fair $(k,z)$-clustering in the sliding window model that either achieves any multiplicative approximation or additive $\frac{\Delta}{2}-1$ error, with probability at least $\frac{2}{3}$, must use $\Omega(n)$ space.

Figures (2)

  • Figure 1: Fair $k$-Median cost curves for all datasets.
  • Figure 2: Running time for our algorithm and Borassi baseline across all datasets.

Theorems & Definitions (48)

  • Theorem 1.1
  • Theorem 1.2
  • Definition 2.1: $(\alpha,\beta)$-fair clustering
  • Definition 2.2: Assignment constraints
  • Definition 2.3: $(k,z)$-clustering with assignment constraints
  • Definition 2.4: Assignment-preserving coreset
  • Definition 3.1: Online assignment-preserving coreset
  • Lemma 3.2
  • Lemma 3.2
  • proof : Proof of \ref{['lemma:online_coreset']}
  • ...and 38 more