Improved Streaming Algorithm for Fair $k$-Center Clustering

Longkun Guo; Zeyu Lin; Chaoqi Jia; Chao Chen

Improved Streaming Algorithm for Fair $k$-Center Clustering

Longkun Guo, Zeyu Lin, Chaoqi Jia, Chao Chen

TL;DR

The paper tackles fair $k$-center clustering under streaming constraints by introducing a two-stage framework that buffers representative points via a $\lambda$-independent center set and then selects centers from that reserved subset. It achieves a 5-approximation in the streaming model with $O(k\log n)$ memory and extends to semi-structured data streams with 3- and 4-approximations for special cases, while also enabling a 3-approximation for the offline problem. A polynomial-time approach using an auxiliary bipartite graph turns Case (3) into a constrained vertex-cover problem, preserving fairness constraints. Empirical results on real and simulated datasets demonstrate improved clustering cost and runtime relative to baselines, and the 5-approximation bound is shown to be tight under sublinear memory, underscoring the practical impact of the method.

Abstract

Many real-world applications pose challenges in incorporating fairness constraints into the $k$-center clustering problem, where the dataset consists of $m$ demographic groups, each with a specified upper bound on the number of centers to ensure fairness. Focusing on big data scenarios, this paper addresses the problem in a streaming setting, where data points arrive one by one sequentially in a continuous stream. Leveraging a structure called the $λ$-independent center set, we propose a one-pass streaming algorithm that first computes a reserved set of points during the streaming process. Then, for the post-streaming process, we propose an approach for selecting centers from the reserved point set by analyzing all three possible cases, transforming the most complicated one into a specially constrained vertex cover problem in an auxiliary graph. Our algorithm achieves a tight approximation ratio of 5 while consuming $O(k\log n)$ memory. It can also be readily adapted to solve the offline fair $k$-center problem, achieving a 3-approximation ratio that matches the current state of the art. Furthermore, we extend our approach to a semi-structured data stream, where data points from each group arrive in batches. In this setting, we present a 3-approximation algorithm for $m = 2$ and a 4-approximation algorithm for general $m$. Lastly, we conduct extensive experiments to evaluate the performance of our approaches, demonstrating that they outperform existing baselines in both clustering cost and runtime efficiency.

Improved Streaming Algorithm for Fair $k$-Center Clustering

TL;DR

The paper tackles fair

-center clustering under streaming constraints by introducing a two-stage framework that buffers representative points via a

-independent center set and then selects centers from that reserved subset. It achieves a 5-approximation in the streaming model with

memory and extends to semi-structured data streams with 3- and 4-approximations for special cases, while also enabling a 3-approximation for the offline problem. A polynomial-time approach using an auxiliary bipartite graph turns Case (3) into a constrained vertex-cover problem, preserving fairness constraints. Empirical results on real and simulated datasets demonstrate improved clustering cost and runtime relative to baselines, and the 5-approximation bound is shown to be tight under sublinear memory, underscoring the practical impact of the method.

Abstract

Many real-world applications pose challenges in incorporating fairness constraints into the

-center clustering problem, where the dataset consists of

demographic groups, each with a specified upper bound on the number of centers to ensure fairness. Focusing on big data scenarios, this paper addresses the problem in a streaming setting, where data points arrive one by one sequentially in a continuous stream. Leveraging a structure called the

-independent center set, we propose a one-pass streaming algorithm that first computes a reserved set of points during the streaming process. Then, for the post-streaming process, we propose an approach for selecting centers from the reserved point set by analyzing all three possible cases, transforming the most complicated one into a specially constrained vertex cover problem in an auxiliary graph. Our algorithm achieves a tight approximation ratio of 5 while consuming

memory. It can also be readily adapted to solve the offline fair

-center problem, achieving a 3-approximation ratio that matches the current state of the art. Furthermore, we extend our approach to a semi-structured data stream, where data points from each group arrive in batches. In this setting, we present a 3-approximation algorithm for

and a 4-approximation algorithm for general

. Lastly, we conduct extensive experiments to evaluate the performance of our approaches, demonstrating that they outperform existing baselines in both clustering cost and runtime efficiency.

Improved Streaming Algorithm for Fair $k$-Center Clustering

TL;DR

Abstract

Improved Streaming Algorithm for Fair $k$-Center Clustering

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)