Parametric Feature Transfer: One-shot Federated Learning with Foundation Models
Mahdi Beitollahi, Alex Bie, Sobhan Hemati, Leo Maxime Brunswic, Xu Li, Xi Chen, Guojun Zhang
TL;DR
FedPFT tackles the high communication burden and data heterogeneity of one-shot federated learning by leveraging foundation-model features. Each client fits a Gaussian mixture model to its class-conditional features and shares only the parametric distributions, allowing the server (or a decentralized chain) to generate synthetic features for training a global classifier head without transmitting raw data. The approach achieves a favorable accuracy-communication frontier across eight datasets and various heterogeneity settings, and it can be extended with differential privacy to provide formal privacy guarantees while preserving utility. The work also demonstrates privacy risks of real feature sharing and offers server-side guarantees on local client accuracy, supported by extensive experiments and theoretical bounds.
Abstract
In one-shot federated learning (FL), clients collaboratively train a global model in a single round of communication. Existing approaches for one-shot FL enhance communication efficiency at the expense of diminished accuracy. This paper introduces FedPFT (Federated Learning with Parametric Feature Transfer), a methodology that harnesses the transferability of foundation models to enhance both accuracy and communication efficiency in one-shot FL. The approach involves transferring per-client parametric models (specifically, Gaussian mixtures) of features extracted from foundation models. Subsequently, each parametric model is employed to generate synthetic features for training a classifier head. Experimental results on eight datasets demonstrate that FedPFT enhances the communication-accuracy frontier in both centralized and decentralized FL scenarios, as well as across diverse data-heterogeneity settings such as covariate shift and task shift, with improvements of up to 20.6%. Additionally, FedPFT adheres to the data minimization principle of FL, as clients do not send real features. We demonstrate that sending real features is vulnerable to potent reconstruction attacks. Moreover, we show that FedPFT is amenable to formal privacy guarantees via differential privacy, demonstrating favourable privacy-accuracy tradeoffs.
