FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
Shangchao Su, Bin Li, Xiangyang Xue
TL;DR
FedRA tackles federated tuning with heterogeneous clients by introducing a random allocation of trainable adapters across model layers, enabling resource-constrained devices to contribute without modifying the backbone. The approach builds a four-step process around a random allocation matrix, local adapter fine-tuning, and amortized aggregation of adapter updates, with optional strategies to handle completely unused layers. The authors provide a convergence analysis under standard FL assumptions and demonstrate state-of-the-art performance across ViT and MLP-Mixer on DomainNet and NICO++ under feature-skew and feature&label-skew, including extreme and dynamic heterogeneity scenarios. The results highlight FedRA's robustness, simplicity, and applicability to a range of transformer-based architectures and non-IID data distributions, offering a practical path for scalable federated tuning of foundation models.
Abstract
With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using adapters. Subsequently, the server aggregates the updated adapter parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.
