Table of Contents
Fetching ...

ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

Kaan Ozkara, Bruce Huang, Ruida Zhou, Suhas Diggavi

TL;DR

This work tackles personalized federated unsupervised learning under client data heterogeneity by introducing ADEPT, a hierarchical Bayesian framework that jointly learns a population model and personalized client models. ADEPT yields three algorithms: ADEPT-PCA for linear dimensionality reduction, ADEPT-AE for nonlinear representations, and ADEPT-DGM for personalized diffusion-based generation, each equipped with convergence analyses and adaptivity to heterogeneity. Theoretical results reveal how collaboration can improve performance under heterogeneity and how the balance between local data and global information affects convergence and generalization. Empirical results on synthetic and real data demonstrate substantial sample amplification and improved worst-case client performance, underscoring the practical value of adaptive collaboration for personalized unsupervised FL.

Abstract

Statistical heterogeneity of clients' local data is an important characteristic in federated learning, motivating personalized algorithms tailored to the local data statistics. Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored. We initiate a systematic study of such personalized unsupervised learning by developing algorithms based on optimization criteria inspired by a hierarchical Bayesian statistical framework. We develop adaptive algorithms that discover the balance between using limited local data and collaborative information. We do this in the context of two unsupervised learning tasks: personalized dimensionality reduction and personalized diffusion models. We develop convergence analyses for our adaptive algorithms which illustrate the dependence on problem parameters (e.g., heterogeneity, local sample size). We also develop a theoretical framework for personalized diffusion models, which shows the benefits of collaboration even under heterogeneity. We finally evaluate our proposed algorithms using synthetic and real data, demonstrating the effective sample amplification for personalized tasks, induced through collaboration, despite data heterogeneity.

ADEPT: Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

TL;DR

This work tackles personalized federated unsupervised learning under client data heterogeneity by introducing ADEPT, a hierarchical Bayesian framework that jointly learns a population model and personalized client models. ADEPT yields three algorithms: ADEPT-PCA for linear dimensionality reduction, ADEPT-AE for nonlinear representations, and ADEPT-DGM for personalized diffusion-based generation, each equipped with convergence analyses and adaptivity to heterogeneity. Theoretical results reveal how collaboration can improve performance under heterogeneity and how the balance between local data and global information affects convergence and generalization. Empirical results on synthetic and real data demonstrate substantial sample amplification and improved worst-case client performance, underscoring the practical value of adaptive collaboration for personalized unsupervised FL.

Abstract

Statistical heterogeneity of clients' local data is an important characteristic in federated learning, motivating personalized algorithms tailored to the local data statistics. Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored. We initiate a systematic study of such personalized unsupervised learning by developing algorithms based on optimization criteria inspired by a hierarchical Bayesian statistical framework. We develop adaptive algorithms that discover the balance between using limited local data and collaborative information. We do this in the context of two unsupervised learning tasks: personalized dimensionality reduction and personalized diffusion models. We develop convergence analyses for our adaptive algorithms which illustrate the dependence on problem parameters (e.g., heterogeneity, local sample size). We also develop a theoretical framework for personalized diffusion models, which shows the benefits of collaboration even under heterogeneity. We finally evaluate our proposed algorithms using synthetic and real data, demonstrating the effective sample amplification for personalized tasks, induced through collaboration, despite data heterogeneity.
Paper Structure (43 sections, 16 theorems, 89 equations, 4 figures, 3 tables, 3 algorithms)

This paper contains 43 sections, 16 theorems, 89 equations, 4 figures, 3 tables, 3 algorithms.

Key Result

Lemma 3.2

Given any $\omega \in (0,1)$. Let the learning rate $\eta_3 \leq (1-\omega) \frac{2 \xi}{d_\theta^2}$ and the initialization $\sigma_0 \geq \omega \sqrt{\frac{2 \xi}{d_\theta}}$. Then, for all $t \in [T]$, we have $\sigma_t \geq \omega \sqrt{\frac{2 \xi}{d_\theta}}$.

Figures (4)

  • Figure 1: Hierarchical Bayesian model of data distribution
  • Figure 2: Ratio of the reconstruction error of different methods to the true model w.r.t. different values of $\sigma^*$. We have $d=100$, $r=20$, $m=10$, and $n=20$.
  • Figure 3: Randomly chosen samples (Left:ADEPT-DGM, noise $\sigma=0.024$; Middle:FedAvg+fine-tuning,noise $\sigma=0.028$; Right:Local training, noise $\sigma=0.032$) (models are trained and samples are chosen with the same seed across runs) from generated dataset for a client with data from '0' class.
  • Figure 4: Violin plot of KID values of clients.

Theorems & Definitions (34)

  • Lemma 3.2: A lower bound on $\sigma_t$
  • Theorem 3.3: Convergence of ADEPT-PCA \ref{['algo:pca']}
  • Lemma 3.4: Non-expansiveness of polar retraction chen2021decentralized
  • Lemma 3.5: Lipschitz type inequality chen2021decentralized
  • Lemma 3.6: Lipschitz smoothness and bounded gradients with respect to $\sigma$
  • Lemma 3.7: Lipschitz smoothness and bounded gradients with respect to $\boldsymbol{U}_i$
  • Lemma 3.8: Lipschitz smoothness and bounded gradients with respect to $\boldsymbol{V}$
  • Lemma 3.9: Lipschitz continuity of ${\frac{\partial}{\partial {\sigma}} } f_i^{\mathrm{pca}}(\boldsymbol{U}, \boldsymbol{V}, \sigma)$ with respect to $\boldsymbol{U}, \boldsymbol{V}$
  • Lemma 3.10: Sufficient Decrease
  • Theorem 3.12: Convergence of ADEPT-AE (\ref{['algo:ae-gd']})
  • ...and 24 more