Fair Clustering in the Sliding Window Model

Vincent Cohen-Addad; Shaofeng H. -C. Jiang; Qiaoyuan Yang; Yubo Zhang; Samson Zhou

Fair Clustering in the Sliding Window Model

Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Qiaoyuan Yang, Yubo Zhang, Samson Zhou

TL;DR

The paper investigates fair clustering in the sliding window streaming model and reveals a sharp separation: any algorithm attaining a finite multiplicative or additive fair-clustering guarantee in this model requires linear space, unlike insertion-only streams. It then shows a positive result under a relaxed fairness constraint: a $(1+\varepsilon)$-approximation with slack can be achieved in sublinear space using online assignment-preserving coresets and a merge-and-reduce framework, producing a $(1-\varepsilon)\alpha, (1+\varepsilon)\beta$-fair clustering whose cost is within a factor $(1+\varepsilon)$ of the optimum on the sliding window, with a robust coreset guarantee. The approach combines assignment-preserving coresets (via RingCoreset and Meyerson sketch) with a streaming reduction to obtain sublinear-space guarantees and a practical pipeline for real data, complemented by empirical evaluations on multiple datasets. This work highlights fundamental limits of fairness under recency constraints and offers a viable, sublinear-space method when a small fairness slack is acceptable, with practical implications for real-time, fair clustering in dynamic data streams.

Abstract

We study streaming algorithms for proportionally fair clustering, a notion originally suggested by Chierichetti et. al. (2017), in the sliding window model. We show that although there exist efficient streaming algorithms in the insertion-only model, surprisingly no algorithm can achieve finite multiplicative ratio without violating the fairness constraint in the sliding window. Hence, the problem of fair clustering is a rare separation between the insertion-only streaming model and the sliding window model. On the other hand, we show that if the fairness constraint is relaxed by a multiplicative $(1+\varepsilon)$ factor, there exists a $(1 + \varepsilon)$-approximate sliding window algorithm that uses $\text{poly}(k\varepsilon^{-1}\log n)$ space. This achieves essentially the best parameters (up to degree in the polynomial) provided the aforementioned lower bound. We also implement a number of empirical evaluations on real datasets to complement our theoretical results.

Fair Clustering in the Sliding Window Model

TL;DR

-approximation with slack can be achieved in sublinear space using online assignment-preserving coresets and a merge-and-reduce framework, producing a

-fair clustering whose cost is within a factor

of the optimum on the sliding window, with a robust coreset guarantee. The approach combines assignment-preserving coresets (via RingCoreset and Meyerson sketch) with a streaming reduction to obtain sublinear-space guarantees and a practical pipeline for real data, complemented by empirical evaluations on multiple datasets. This work highlights fundamental limits of fairness under recency constraints and offers a viable, sublinear-space method when a small fairness slack is acceptable, with practical implications for real-time, fair clustering in dynamic data streams.

Abstract

factor, there exists a

-approximate sliding window algorithm that uses

space. This achieves essentially the best parameters (up to degree in the polynomial) provided the aforementioned lower bound. We also implement a number of empirical evaluations on real datasets to complement our theoretical results.

Fair Clustering in the Sliding Window Model

TL;DR

Abstract

Fair Clustering in the Sliding Window Model

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (48)