Table of Contents
Fetching ...

Parametric Feature Transfer: One-shot Federated Learning with Foundation Models

Mahdi Beitollahi, Alex Bie, Sobhan Hemati, Leo Maxime Brunswic, Xu Li, Xi Chen, Guojun Zhang

TL;DR

FedPFT tackles the high communication burden and data heterogeneity of one-shot federated learning by leveraging foundation-model features. Each client fits a Gaussian mixture model to its class-conditional features and shares only the parametric distributions, allowing the server (or a decentralized chain) to generate synthetic features for training a global classifier head without transmitting raw data. The approach achieves a favorable accuracy-communication frontier across eight datasets and various heterogeneity settings, and it can be extended with differential privacy to provide formal privacy guarantees while preserving utility. The work also demonstrates privacy risks of real feature sharing and offers server-side guarantees on local client accuracy, supported by extensive experiments and theoretical bounds.

Abstract

In one-shot federated learning (FL), clients collaboratively train a global model in a single round of communication. Existing approaches for one-shot FL enhance communication efficiency at the expense of diminished accuracy. This paper introduces FedPFT (Federated Learning with Parametric Feature Transfer), a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL. The approach involves transferring per-client parametric models (specifically, Gaussian mixtures) of features extracted from foundation models. Subsequently, each parametric model is employed to generate synthetic features for training a classifier head. Experimental results on eight datasets demonstrate that FedPFT enhances the communication-accuracy frontier in both centralized and decentralized FL scenarios, as well as across diverse data-heterogeneity settings such as covariate shift and task shift, with improvements of up to 20.6%. Additionally, FedPFT adheres to the data minimization principle of FL, as clients do not send real features. We demonstrate that sending real features is vulnerable to potent reconstruction attacks. Moreover, we show that FedPFT is amenable to formal privacy guarantees via differential privacy, demonstrating favourable privacy-accuracy tradeoffs.

Parametric Feature Transfer: One-shot Federated Learning with Foundation Models

TL;DR

FedPFT tackles the high communication burden and data heterogeneity of one-shot federated learning by leveraging foundation-model features. Each client fits a Gaussian mixture model to its class-conditional features and shares only the parametric distributions, allowing the server (or a decentralized chain) to generate synthetic features for training a global classifier head without transmitting raw data. The approach achieves a favorable accuracy-communication frontier across eight datasets and various heterogeneity settings, and it can be extended with differential privacy to provide formal privacy guarantees while preserving utility. The work also demonstrates privacy risks of real feature sharing and offers server-side guarantees on local client accuracy, supported by extensive experiments and theoretical bounds.

Abstract

In one-shot federated learning (FL), clients collaboratively train a global model in a single round of communication. Existing approaches for one-shot FL enhance communication efficiency at the expense of diminished accuracy. This paper introduces FedPFT (Federated Learning with Parametric Feature Transfer), a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL. The approach involves transferring per-client parametric models (specifically, Gaussian mixtures) of features extracted from foundation models. Subsequently, each parametric model is employed to generate synthetic features for training a classifier head. Experimental results on eight datasets demonstrate that FedPFT enhances the communication-accuracy frontier in both centralized and decentralized FL scenarios, as well as across diverse data-heterogeneity settings such as covariate shift and task shift, with improvements of up to 20.6%. Additionally, FedPFT adheres to the data minimization principle of FL, as clients do not send real features. We demonstrate that sending real features is vulnerable to potent reconstruction attacks. Moreover, we show that FedPFT is amenable to formal privacy guarantees via differential privacy, demonstrating favourable privacy-accuracy tradeoffs.
Paper Structure (50 sections, 4 theorems, 23 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 50 sections, 4 theorems, 23 equations, 11 figures, 7 tables, 1 algorithm.

Key Result

Theorem 4.1

Suppose the feature embedding $f$ satisfies $\|f\|_2 \leq 1$. Let $\hat{\boldsymbol\mu}(\cdot)$ and $\hat{\mathbf{\Sigma}}(\cdot)$ be the estimator of mean and covariance, respectively. Define the Gaussian mechanism where the elements of vector $\Delta\boldsymbol{\mu}$ and matrix $\Delta \boldsymbol \Sigma$ are sampled from independent $\mathcal{N}\left(0,\left(\frac{4}{n_i\epsilon}\sqrt{5 \ln(4/

Figures (11)

  • Figure 1: Comparison of different one-shot FL methods for image classification on Caltech-101 with 50 clients. See Section \ref{['sec:comp_sota']} for experimental details. FedPFT and DP-FedPFT outperform other one-shot FL methods and are competitive with transmitting real features (Centralized). With more communication budget, multi-round FL (i.e. FedAvg) performs better than one-shot methods.
  • Figure 2: Illustration of FedPFT in centralized FL. Each client fits GMMs to the distributions of extracted features for each class. Then, GMM's parameters are transmitted to the server, which then samples from these distributions to train a classifier head as the global model.
  • Figure 3: Illustration of the FedPFT framework for decentralized FL. Each client updates the received statistics of GMMs with its local data and transfers it to other clients.
  • Figure 4: FedPFT vs existing one-shot and multi-round FL methods in Centralized setting with CIFAR100 (left) and Caltech 101 (right) dataset. FedPFT ($\mathcal{G}$) and DP-FedPFT ($\hat{\mathcal{G}}$) surpass other one-shot FL methods, and are competitive with sending raw features (Centralized).
  • Figure 5: Five clients in a linear topology. Each client updates its received GMM with its local data and sends it to the next client.
  • ...and 6 more figures

Theorems & Definitions (9)

  • Theorem 4.1
  • Theorem 6.1
  • Definition 2.1
  • Lemma 2.2
  • proof
  • Remark 2.3
  • Remark 2.4
  • Theorem 3.1
  • proof