AugFL: Augmenting Federated Learning with Pretrained Models
Sheng Yue, Zerui Qin, Yongheng Deng, Ju Ren, Yaoxue Zhang, Junshan Zhang
TL;DR
This work tackles data scarcity in decentralized Federated Learning by leveraging server-hosted pretrained models (PMs) to enable personalized learning across heterogeneous clients. It formalizes PM-aided personalized FL as a regularized federated meta-learning problem and introduces AugFL, an inexact-ADMM algorithm that decouples the PM knowledge transfer to the server, achieving $\mathcal{O}(n)$ per-round complexity while keeping PM data private. The authors provide convergence guarantees to an $\epsilon$-FOSP in $\mathcal{O}(1/\epsilon^2)$ rounds, along with adaptation and knowledge-transfer bounds that quantify how PM information improves performance on a target client and reduces forgetting of the pretraining task. Empirical results on multiple benchmarks show that AugFL consistently outperforms baselines, with gains amplified by larger PMs and effective server-side transfer, underscoring the practical impact for privacy-preserving, scalable FL with external knowledge sources.
Abstract
Federated Learning (FL) has garnered widespread interest in recent years. However, owing to strict privacy policies or limited storage capacities of training participants such as IoT devices, its effective deployment is often impeded by the scarcity of training data in practical decentralized learning environments. In this paper, we study enhancing FL with the aid of (large) pre-trained models (PMs), that encapsulate wealthy general/domain-agnostic knowledge, to alleviate the data requirement in conducting FL from scratch. Specifically, we consider a networked FL system formed by a central server and distributed clients. First, we formulate the PM-aided personalized FL as a regularization-based federated meta-learning problem, where clients join forces to learn a meta-model with knowledge transferred from a private PM stored at the server. Then, we develop an inexact-ADMM-based algorithm, AugFL, to optimize the problem with no need to expose the PM or incur additional computational costs to local clients. Further, we establish theoretical guarantees for AugFL in terms of communication complexity, adaptation performance, and the benefit of knowledge transfer in general non-convex cases. Extensive experiments corroborate the efficacy and superiority of AugFL over existing baselines.
