ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

Kaan Ozkara; Bruce Huang; Ruida Zhou; Suhas Diggavi

ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

Kaan Ozkara, Bruce Huang, Ruida Zhou, Suhas Diggavi

TL;DR

This work tackles personalized federated unsupervised learning under client data heterogeneity by introducing ADEPT, a hierarchical Bayesian framework that jointly learns a population model and personalized client models. ADEPT yields three algorithms: ADEPT-PCA for linear dimensionality reduction, ADEPT-AE for nonlinear representations, and ADEPT-DGM for personalized diffusion-based generation, each equipped with convergence analyses and adaptivity to heterogeneity. Theoretical results reveal how collaboration can improve performance under heterogeneity and how the balance between local data and global information affects convergence and generalization. Empirical results on synthetic and real data demonstrate substantial sample amplification and improved worst-case client performance, underscoring the practical value of adaptive collaboration for personalized unsupervised FL.

Abstract

Statistical heterogeneity of clients' local data is an important characteristic in federated learning, motivating personalized algorithms tailored to the local data statistics. Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored. We initiate a systematic study of such personalized unsupervised learning by developing algorithms based on optimization criteria inspired by a hierarchical Bayesian statistical framework. We develop adaptive algorithms that discover the balance between using limited local data and collaborative information. We do this in the context of two unsupervised learning tasks: personalized dimensionality reduction and personalized diffusion models. We develop convergence analyses for our adaptive algorithms which illustrate the dependence on problem parameters (e.g., heterogeneity, local sample size). We also develop a theoretical framework for personalized diffusion models, which shows the benefits of collaboration even under heterogeneity. We finally evaluate our proposed algorithms using synthetic and real data, demonstrating the effective sample amplification for personalized tasks, induced through collaboration, despite data heterogeneity.

ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

TL;DR

Abstract

Paper Structure (43 sections, 16 theorems, 89 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 43 sections, 16 theorems, 89 equations, 4 figures, 3 tables, 3 algorithms.

Introduction
Problem Formulation
Hierarchical Bayes for personalized learning
Personalized Dimensionality Reduction
Linear Dimensionality Reduction:
Non-linear dimensionality reduction:
Personalized Generation through Diffusion Models
Personalized Federated Dimensionality Reduction
Personalized Adaptive PCA: ADEPT-PCA
Proof Outline for Theorem \ref{['thm:PCA_convergence']}
Personalized Adaptive AEs: ADEPT-AE
Proof outline of Theorem \ref{['thm:AE_convergence']}
Sufficient decrease when $\tau$ divides $t$
Sufficient decrease when $\tau$ does not divide $t$
Personalized Generation through Adaptive Diffusion Models: ADEPT-DGM
...and 28 more sections

Key Result

Lemma 3.2

Given any $\omega \in (0,1)$. Let the learning rate $\eta_3 \leq (1-\omega) \frac{2 \xi}{d_\theta^2}$ and the initialization $\sigma_0 \geq \omega \sqrt{\frac{2 \xi}{d_\theta}}$. Then, for all $t \in [T]$, we have $\sigma_t \geq \omega \sqrt{\frac{2 \xi}{d_\theta}}$.

Figures (4)

Figure 1: Hierarchical Bayesian model of data distribution
Figure 2: Ratio of the reconstruction error of different methods to the true model w.r.t. different values of $\sigma^*$. We have $d=100$, $r=20$, $m=10$, and $n=20$.
Figure 3: Randomly chosen samples (Left:ADEPT-DGM, noise $\sigma=0.024$; Middle:FedAvg+fine-tuning,noise $\sigma=0.028$; Right:Local training, noise $\sigma=0.032$) (models are trained and samples are chosen with the same seed across runs) from generated dataset for a client with data from '0' class.
Figure 4: Violin plot of KID values of clients.

Theorems & Definitions (34)

Lemma 3.2: A lower bound on $\sigma_t$
Theorem 3.3: Convergence of ADEPT-PCA \ref{['algo:pca']}
Lemma 3.4: Non-expansiveness of polar retraction chen2021decentralized
Lemma 3.5: Lipschitz type inequality chen2021decentralized
Lemma 3.6: Lipschitz smoothness and bounded gradients with respect to $\sigma$
Lemma 3.7: Lipschitz smoothness and bounded gradients with respect to $\boldsymbol{U}_i$
Lemma 3.8: Lipschitz smoothness and bounded gradients with respect to $\boldsymbol{V}$
Lemma 3.9: Lipschitz continuity of ${\frac{\partial}{\partial {\sigma}} } f_i^{\mathrm{pca}}(\boldsymbol{U}, \boldsymbol{V}, \sigma)$ with respect to $\boldsymbol{U}, \boldsymbol{V}$
Lemma 3.10: Sufficient Decrease
Theorem 3.12: Convergence of ADEPT-AE (\ref{['algo:ae-gd']})
...and 24 more

ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

TL;DR

Abstract

ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (34)