Fair Clustering in the Sliding Window Model
Vincent Cohen-Addad, Shaofeng H. -C. Jiang, Qiaoyuan Yang, Yubo Zhang, Samson Zhou
TL;DR
The paper investigates fair clustering in the sliding window streaming model and reveals a sharp separation: any algorithm attaining a finite multiplicative or additive fair-clustering guarantee in this model requires linear space, unlike insertion-only streams. It then shows a positive result under a relaxed fairness constraint: a $(1+\varepsilon)$-approximation with slack can be achieved in sublinear space using online assignment-preserving coresets and a merge-and-reduce framework, producing a $(1-\varepsilon)\alpha, (1+\varepsilon)\beta$-fair clustering whose cost is within a factor $(1+\varepsilon)$ of the optimum on the sliding window, with a robust coreset guarantee. The approach combines assignment-preserving coresets (via RingCoreset and Meyerson sketch) with a streaming reduction to obtain sublinear-space guarantees and a practical pipeline for real data, complemented by empirical evaluations on multiple datasets. This work highlights fundamental limits of fairness under recency constraints and offers a viable, sublinear-space method when a small fairness slack is acceptable, with practical implications for real-time, fair clustering in dynamic data streams.
Abstract
We study streaming algorithms for proportionally fair clustering, a notion originally suggested by Chierichetti et. al. (2017), in the sliding window model. We show that although there exist efficient streaming algorithms in the insertion-only model, surprisingly no algorithm can achieve finite multiplicative ratio without violating the fairness constraint in the sliding window. Hence, the problem of fair clustering is a rare separation between the insertion-only streaming model and the sliding window model. On the other hand, we show that if the fairness constraint is relaxed by a multiplicative $(1+\varepsilon)$ factor, there exists a $(1 + \varepsilon)$-approximate sliding window algorithm that uses $\text{poly}(k\varepsilon^{-1}\log n)$ space. This achieves essentially the best parameters (up to degree in the polynomial) provided the aforementioned lower bound. We also implement a number of empirical evaluations on real datasets to complement our theoretical results.
