Table of Contents
Fetching ...

Aggregating Low Rank Adapters in Federated Fine-tuning

Evelyn Trautmann, Ian Hales, Martin F. Volk

TL;DR

This work addresses the high cost of fine-tuning large language models in federated settings by leveraging parameter-efficient LoRA adapters. It identifies that naive FedAvg aggregation of LoRA parameters introduces errors and privacy incompatibilities, and introduces Full Rank Aggregation (FRA-LoRA) which aggregates full weight increments and then reduces to a low-rank form via SVD, preserving DP compatibility. Through experiments on GLUE benchmarks (SST2, MNLI) under balanced and imbalanced data distributions, FRA-LoRA demonstrates faster early convergence and competitive accuracy relative to centralized training, while maintaining privacy-friendly properties; it also analyzes error components and suggests hybrid strategies (FRA in early rounds, then FFA) to balance convergence and overfitting. The proposed approach offers a practical, privacy-aware path to efficient federated fine-tuning of LLMs with potential for broader adoption in privacy-sensitive, distributed learning contexts.

Abstract

Fine-tuning large language models requires high computational and memory resources, and is therefore associated with significant costs. When training on federated datasets, an increased communication effort is also needed. For this reason, parameter-efficient methods (PEFT) are becoming increasingly important. In this context, very good results have already been achieved by fine-tuning with low-rank adaptation methods (LoRA). The application of LoRA methods in Federated Learning, and especially the aggregation of adaptation matrices, is a current research field. In this article, we propose a novel aggregation method and compare it with different existing aggregation methods of low rank adapters trained in a federated fine-tuning of large machine learning models and evaluate their performance with respect to selected GLUE benchmark datasets.

Aggregating Low Rank Adapters in Federated Fine-tuning

TL;DR

This work addresses the high cost of fine-tuning large language models in federated settings by leveraging parameter-efficient LoRA adapters. It identifies that naive FedAvg aggregation of LoRA parameters introduces errors and privacy incompatibilities, and introduces Full Rank Aggregation (FRA-LoRA) which aggregates full weight increments and then reduces to a low-rank form via SVD, preserving DP compatibility. Through experiments on GLUE benchmarks (SST2, MNLI) under balanced and imbalanced data distributions, FRA-LoRA demonstrates faster early convergence and competitive accuracy relative to centralized training, while maintaining privacy-friendly properties; it also analyzes error components and suggests hybrid strategies (FRA in early rounds, then FFA) to balance convergence and overfitting. The proposed approach offers a practical, privacy-aware path to efficient federated fine-tuning of LLMs with potential for broader adoption in privacy-sensitive, distributed learning contexts.

Abstract

Fine-tuning large language models requires high computational and memory resources, and is therefore associated with significant costs. When training on federated datasets, an increased communication effort is also needed. For this reason, parameter-efficient methods (PEFT) are becoming increasingly important. In this context, very good results have already been achieved by fine-tuning with low-rank adaptation methods (LoRA). The application of LoRA methods in Federated Learning, and especially the aggregation of adaptation matrices, is a current research field. In this article, we propose a novel aggregation method and compare it with different existing aggregation methods of low rank adapters trained in a federated fine-tuning of large machine learning models and evaluate their performance with respect to selected GLUE benchmark datasets.
Paper Structure (19 sections, 10 equations, 7 figures, 4 tables)

This paper contains 19 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Low rank adaptors approximation weight increments, illustration from original paper hu2021lora.
  • Figure 2: Federated Learning architecture with a central orchestrator and 3 clients with private datasets.
  • Figure 3: SST2 Dataset (i.i.d. split with balanced classes): Evaluation set accuracy on each client per iteration over a rolling average of 7.
  • Figure 4: MNLI Dataset (i.i.d. split with balanced classes): Evaluation set accuracy on each client per iteration.
  • Figure 5: SST2 Dataset (split with imbalanced classes): Evaluation set accuracy on each client per iteration over a rolling average of 7.
  • ...and 2 more figures