Table of Contents
Fetching ...

Improving the Utility of Differentially Private Clustering through Dynamical Processing

Junyoung Byun, Yujin Choi, Jaewook Lee

TL;DR

This work tackles the utility-privacy trade-off in differentially private clustering by introducing Morse-theory–driven dynamical processing that links Gaussian sub-clusters into complex, nonconvex structures. It builds on DP MoG and DP k-means through a gradient-flow framework that identifies transition equilibrium vectors (TEVs) to connect centers into a hierarchical, DP-preserving graph, enabling any target number of clusters. Theoretical results prove the dynamical processing preserves DP, while experiments across six real datasets show consistent ARI improvements over baseline DP clustering methods, especially when baseline performance is moderate. The approach is versatile, allowing integration with different DP clustering baselines and scalable to various density estimators, with potential extensions to kernel methods and broader hierarchical clustering tasks.

Abstract

This study aims to alleviate the trade-off between utility and privacy of differentially private clustering. Existing works focus on simple methods, which show poor performance for non-convex clusters. To fit complex cluster distributions, we propose sophisticated dynamical processing inspired by Morse theory, with which we hierarchically connect the Gaussian sub-clusters obtained through existing methods. Our theoretical results imply that the proposed dynamical processing introduces little to no additional privacy loss. Experiments show that our framework can improve the clustering performance of existing methods at the same privacy level.

Improving the Utility of Differentially Private Clustering through Dynamical Processing

TL;DR

This work tackles the utility-privacy trade-off in differentially private clustering by introducing Morse-theory–driven dynamical processing that links Gaussian sub-clusters into complex, nonconvex structures. It builds on DP MoG and DP k-means through a gradient-flow framework that identifies transition equilibrium vectors (TEVs) to connect centers into a hierarchical, DP-preserving graph, enabling any target number of clusters. Theoretical results prove the dynamical processing preserves DP, while experiments across six real datasets show consistent ARI improvements over baseline DP clustering methods, especially when baseline performance is moderate. The approach is versatile, allowing integration with different DP clustering baselines and scalable to various density estimators, with potential extensions to kernel methods and broader hierarchical clustering tasks.

Abstract

This study aims to alleviate the trade-off between utility and privacy of differentially private clustering. Existing works focus on simple methods, which show poor performance for non-convex clusters. To fit complex cluster distributions, we propose sophisticated dynamical processing inspired by Morse theory, with which we hierarchically connect the Gaussian sub-clusters obtained through existing methods. Our theoretical results imply that the proposed dynamical processing introduces little to no additional privacy loss. Experiments show that our framework can improve the clustering performance of existing methods at the same privacy level.
Paper Structure (20 sections, 5 theorems, 15 equations, 7 figures, 2 tables, 4 algorithms)

This paper contains 20 sections, 5 theorems, 15 equations, 7 figures, 2 tables, 4 algorithms.

Key Result

Proposition 5.1

Algorithm alg:DPMoGhard is $(\epsilon, \delta)$-differentially private.

Figures (7)

  • Figure 1: Illustration of Morse theory. (a) A three-dimensional surface plot of a toy example. (b) A level curve of (a). In (b), 'o's are stable equilibrium points, and 'x' refers to the index-1 equilibrium point. The triangle marker refers to the index-two equilibrium point. Note that $\boldsymbol{x}_3^1$ is an index-1 equilibrium vector, but not a TEV.
  • Figure 2: Description of the proposed dynamical processing. After obtaining an MoG density from textbfDPClustering, (a) TEVs ('x') between adjacent centers (two numbers in parentheses) are identified, and the weight between two centers is calculated as the density (number at the bottom left) of the corresponding TEV. (b) A dendrogram is drawn based on the weighted graph. The x-axis denotes the index of centers, and the y-axis denotes the weight between the centers.
  • Figure 3: Procedure to find TEVs. Big black points are MoG centers, and the big orange point is the TEV. Small black points with numbers represent $\boldsymbol{m}^t$ at step $t$, and small red points indicate minimum density points on the quadratic string.
  • Figure 4: Clustering results of DPMoG-hard and DPMoG-hard-Morse for real-world datasets. The x-axis indicates the privacy budget $\epsilon$, and the y-axis indicates the ARI score. The dotted lines indicate the performances of the non-private models.
  • Figure 5: Clustering results of DPLloyd and DPLloyd-Morse for real-world datasets. The x-axis indicates the privacy budget $\epsilon$, and the y-axis indicates the ARI score. The dotted lines indicate the performances of the non-private models.
  • ...and 2 more figures

Theorems & Definitions (16)

  • Definition 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5
  • Remark 3.6
  • Remark 3.7
  • Proposition 5.1
  • proof
  • Proposition 5.2
  • ...and 6 more