Table of Contents
Fetching ...

Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations

Guang Yang, Lixia Luo, Qiongxiu Li

TL;DR

The paper shows that centroid disclosure in federated $k$-means reveals significant privacy leakage due to temporal correlations across iterations. It introduces Trajectory-Aware Reconstruction (TAR), which constructs a deduplicated iteration-record matrix $\mathbf{W}^\star$ and uses a reduced row echelon form (RREF) test to achieve perfect input reconstruction from released cluster sums $\mathbf{C}$. The key contributions are (i) identifying a leakage mechanism beyond classic HSSP assumptions, (ii) proposing TAR with a detailed construction procedure and RREF-based recoverability criterion, and (iii) validating the approach on synthetic data, Iris, and high-dimensional Olivetti faces, showing high success rates even under truncated disclosures. These results establish a fundamental privacy–efficiency trade-off in federated clustering and motivate the development of privacy-preserving alternatives (e.g., DP, SMPC, HE) at the cost of performance.

Abstract

Federated clustering allows multiple parties to discover patterns in distributed data without sharing raw samples. To reduce overhead, many protocols disclose intermediate centroids during training. While often treated as harmless for efficiency, whether such disclosure compromises privacy remains an open question. Prior analyses modeled the problem as a so-called Hidden Subset Sum Problem (HSSP) and argued that centroid release may be safe, since classical HSSP attacks fail to recover inputs. We revisit this question and uncover a new leakage mechanism: temporal regularities in $k$-means iterations create exploitable structure that enables perfect input reconstruction. Building on this insight, we propose Trajectory-Aware Reconstruction (TAR), an attack that combines temporal assignment information with algebraic analysis to recover exact original inputs. Our findings provide the first rigorous evidence, supported by a practical attack, that centroid disclosure in federated clustering significantly compromises privacy, exposing a fundamental tension between privacy and efficiency.

Breaking Privacy in Federated Clustering: Perfect Input Reconstruction via Temporal Correlations

TL;DR

The paper shows that centroid disclosure in federated -means reveals significant privacy leakage due to temporal correlations across iterations. It introduces Trajectory-Aware Reconstruction (TAR), which constructs a deduplicated iteration-record matrix and uses a reduced row echelon form (RREF) test to achieve perfect input reconstruction from released cluster sums . The key contributions are (i) identifying a leakage mechanism beyond classic HSSP assumptions, (ii) proposing TAR with a detailed construction procedure and RREF-based recoverability criterion, and (iii) validating the approach on synthetic data, Iris, and high-dimensional Olivetti faces, showing high success rates even under truncated disclosures. These results establish a fundamental privacy–efficiency trade-off in federated clustering and motivate the development of privacy-preserving alternatives (e.g., DP, SMPC, HE) at the cost of performance.

Abstract

Federated clustering allows multiple parties to discover patterns in distributed data without sharing raw samples. To reduce overhead, many protocols disclose intermediate centroids during training. While often treated as harmless for efficiency, whether such disclosure compromises privacy remains an open question. Prior analyses modeled the problem as a so-called Hidden Subset Sum Problem (HSSP) and argued that centroid release may be safe, since classical HSSP attacks fail to recover inputs. We revisit this question and uncover a new leakage mechanism: temporal regularities in -means iterations create exploitable structure that enables perfect input reconstruction. Building on this insight, we propose Trajectory-Aware Reconstruction (TAR), an attack that combines temporal assignment information with algebraic analysis to recover exact original inputs. Our findings provide the first rigorous evidence, supported by a practical attack, that centroid disclosure in federated clustering significantly compromises privacy, exposing a fundamental tension between privacy and efficiency.

Paper Structure

This paper contains 14 sections, 12 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Olivetti Faces: originals (top row) vs. reconstructions (bottom row) obtained from $k$-means clustering. The L2 norm between reconstructed and original input images is 0, namely perfect (exact) input reconstruction.

Theorems & Definitions (1)

  • Definition 3.1: Successful Attack