Table of Contents
Fetching ...

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

Shangchao Su, Bin Li, Xiangyang Xue

TL;DR

FedRA tackles federated tuning with heterogeneous clients by introducing a random allocation of trainable adapters across model layers, enabling resource-constrained devices to contribute without modifying the backbone. The approach builds a four-step process around a random allocation matrix, local adapter fine-tuning, and amortized aggregation of adapter updates, with optional strategies to handle completely unused layers. The authors provide a convergence analysis under standard FL assumptions and demonstrate state-of-the-art performance across ViT and MLP-Mixer on DomainNet and NICO++ under feature-skew and feature&label-skew, including extreme and dynamic heterogeneity scenarios. The results highlight FedRA's robustness, simplicity, and applicability to a range of transformer-based architectures and non-IID data distributions, offering a practical path for scalable federated tuning of foundation models.

Abstract

With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using adapters. Subsequently, the server aggregates the updated adapter parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients

TL;DR

FedRA tackles federated tuning with heterogeneous clients by introducing a random allocation of trainable adapters across model layers, enabling resource-constrained devices to contribute without modifying the backbone. The approach builds a four-step process around a random allocation matrix, local adapter fine-tuning, and amortized aggregation of adapter updates, with optional strategies to handle completely unused layers. The authors provide a convergence analysis under standard FL assumptions and demonstrate state-of-the-art performance across ViT and MLP-Mixer on DomainNet and NICO++ under feature-skew and feature&label-skew, including extreme and dynamic heterogeneity scenarios. The results highlight FedRA's robustness, simplicity, and applicability to a range of transformer-based architectures and non-IID data distributions, offering a practical path for scalable federated tuning of foundation models.

Abstract

With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using adapters. Subsequently, the server aggregates the updated adapter parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.
Paper Structure (12 sections, 1 theorem, 5 equations, 6 figures, 6 tables)

This paper contains 12 sections, 1 theorem, 5 equations, 6 figures, 6 tables.

Key Result

Theorem 1

Based on the residual connection assumption, and the above three assumptions. With the client learning rate $\eta$ satisfying $\frac{3 N}{16 J^2 h \Gamma^*}+\frac{N}{6 J h \Gamma^*} \leq \eta \leq \frac{1}{4 J h}$, we have: where $J$ is the client update steps, $S^t$ is the global model layers trained in this round, $T$ is the number of rounds, $\Delta_1=\frac{J \eta}{2}-\frac{3 N}{32 J h \Gamma^

Figures (6)

  • Figure 1: Federated tuning for heterogeneous clients.
  • Figure 2: (a) The width-based methods disrupt the structure of each layer and therefore cannot be directly applied to pre-trained parameters. (b) Although depth-based methods can preserve the integrity of the entire model layers, they face a significant issue of feature imbalance, where only a small fraction of resource-rich clients can train the higher layers of the model. (c) Our approach involves the random allocation of adapters, enabling more efficient federated tuning of pre-trained foundation models.
  • Figure 3: The framework of FedRA. In each communication round, the server first randomly generates an allocation matrix to assign subsets of the global model to clients. After client-side fine-tuning, the server collects the fine-tuned LoRA parameters and aggregates them into the global model based on the allocation matrix.
  • Figure 4: Convergence under Random Allocation.
  • Figure 5: The t-SNE visualization of the global model.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Theorem 1