Table of Contents
Fetching ...

Federated k-Means via Generalized Total Variation Minimization

A. Jung

TL;DR

This work addresses privacy-preserving federated clustering for hard $k$-means by formulating the problem as Generalized Total Variation Minimization (GTVMin) over a device network. Local centroids ${\bf W}^{(i)}$ are coupled via a graph-based penalty that encourages agreement between neighboring devices, enabling decentralized updates without sharing raw data. A distributed non-linear Jacobi algorithm is proposed to solve the GTVMin objective, using Lloyd-style local updates and neighbor-centroid matching, with the special case $\alpha=1$ reducing to standard $k$-means on an augmented dataset. The approach yields locally consistent centroid sets per connected component and provides a privacy-friendly alternative to centralized pooling, with potential applicability to heterogeneous local data distributions across networks.

Abstract

We consider the problem of federated clustering, where interconnected devices have access to private local datasets and need to jointly cluster the overall dataset without sharing their local dataset. Our focus is on hard clustering based on the k-means principle. We formulate federated k-means clustering as an instance of GTVMin. This formulation naturally lends to a federated k-means algorithm where each device updates local cluster centroids by solving a modified local k-means problem. The modification involves adding a penalty term to measure the discrepancy between the cluster centroid of neighbouring devices. Our federated k-means algorithm is privacy-friendly as it only requires sharing aggregated information among interconnected devices.

Federated k-Means via Generalized Total Variation Minimization

TL;DR

This work addresses privacy-preserving federated clustering for hard -means by formulating the problem as Generalized Total Variation Minimization (GTVMin) over a device network. Local centroids are coupled via a graph-based penalty that encourages agreement between neighboring devices, enabling decentralized updates without sharing raw data. A distributed non-linear Jacobi algorithm is proposed to solve the GTVMin objective, using Lloyd-style local updates and neighbor-centroid matching, with the special case reducing to standard -means on an augmented dataset. The approach yields locally consistent centroid sets per connected component and provides a privacy-friendly alternative to centralized pooling, with potential applicability to heterogeneous local data distributions across networks.

Abstract

We consider the problem of federated clustering, where interconnected devices have access to private local datasets and need to jointly cluster the overall dataset without sharing their local dataset. Our focus is on hard clustering based on the k-means principle. We formulate federated k-means clustering as an instance of GTVMin. This formulation naturally lends to a federated k-means algorithm where each device updates local cluster centroids by solving a modified local k-means problem. The modification involves adding a penalty term to measure the discrepancy between the cluster centroid of neighbouring devices. Our federated k-means algorithm is privacy-friendly as it only requires sharing aggregated information among interconnected devices.

Paper Structure

This paper contains 4 sections, 13 equations, 1 figure, 1 algorithm.

Figures (1)

  • Figure 1: A distributed clustering application with several devices, indexed by $i \in [n]$, that have access to local datasets $\mathcal{D}^{(i)}$. These local datasets constitute the overall dataset $\mathcal{D} = \bigcup_{i \in [i]} \mathcal{D}^{(i)}$. This paper studies federated learning techniques that allow the devices to compute cluster centroids that approximately solve $k$-means clustering for $\mathcal{D}$.