Making Old Things New: A Unified Algorithm for Differentially Private Clustering
Max Dupré la Tour, Monika Henzinger, David Saulpic
TL;DR
This work presents a unified, privacy-preserving approach to clustering under multiple differential privacy models. Building on a 20-year-old greedy algorithm by Mettu and Plaxton, the authors privatize the procedure via a private generalized summation (and generalized histograms) framework, enabling private $(k,z)$-clustering with both multiplicative and additive guarantees. The core ideas include dimension reduction, lifting private projections back to the original space, and boosting both the multiplicative approximation and the success probability through bicriteria reductions and the exponential mechanism. The authors extend the analysis to centralized DP, local DP, shuffle DP, continual observation, and MPC, delivering first results for continual observation and improving results for several other models. Overall, the paper provides a versatile blueprint for private clustering that matches or nearly matches prior bounds while offering a clear pathway to new privacy regimes in practice.
Abstract
As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied for the local and the shuffle variation. In each case, the goal is to design an algorithm that computes privately a clustering, with the smallest possible error. The study of each variation gave rise to new algorithms: the landscape of private clustering algorithms is therefore quite intricate. In this paper, we show that a 20-year-old algorithm can be slightly modified to work for any of these models. This provides a unified picture: while matching almost all previously known results, it allows us to improve some of them and extend it to a new privacy model, the continual observation setting, where the input is changing over time and the algorithm must output a new solution at each time step.
