Table of Contents
Fetching ...

Improving LoRA in Privacy-preserving Federated Learning

Youbang Sun, Zitao Li, Yaliang Li, Bolin Ding

TL;DR

This paper identifies key challenges when applying LoRA to privacy-preserving federated learning, notably data heterogeneity, DP-induced noise amplification, and α sensitivity, which can destabilize training and degrade performance.To address these issues, the authors propose Federated Freeze A LoRA (FFA-LoRA), which fixes the randomly initialized A and trains only the zero-initialized B, thereby reducing communication and computation and aligning better with FL/DP dynamics.The approach is theoretically motivated and empirically validated across language and vision tasks, showing consistent improvements over vanilla LoRA under DP and data heterogeneity, while eliminating the need for α-tuning and halving the parameter count.The results on RoBERTa and LLaMA demonstrate broad applicability, with extensions to GSM-8K and Vision-101 indicating potential for wider use in PEFT for federated, privacy-preserving fine-tuning of large models.

Abstract

Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server. Thus, this paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges and further halve the communication cost of federated fine-tuning LLMs. The core idea of FFA-LoRA is to fix the randomly initialized non-zero matrices and only fine-tune the zero-initialized matrices. Compared to LoRA, FFA-LoRA is motivated by practical and theoretical benefits in privacy-preserved FL. Our experiments demonstrate that FFA-LoRA provides more consistent performance with better computational efficiency over vanilla LoRA in various FL tasks.

Improving LoRA in Privacy-preserving Federated Learning

TL;DR

This paper identifies key challenges when applying LoRA to privacy-preserving federated learning, notably data heterogeneity, DP-induced noise amplification, and α sensitivity, which can destabilize training and degrade performance.To address these issues, the authors propose Federated Freeze A LoRA (FFA-LoRA), which fixes the randomly initialized A and trains only the zero-initialized B, thereby reducing communication and computation and aligning better with FL/DP dynamics.The approach is theoretically motivated and empirically validated across language and vision tasks, showing consistent improvements over vanilla LoRA under DP and data heterogeneity, while eliminating the need for α-tuning and halving the parameter count.The results on RoBERTa and LLaMA demonstrate broad applicability, with extensions to GSM-8K and Vision-101 indicating potential for wider use in PEFT for federated, privacy-preserving fine-tuning of large models.

Abstract

Low-rank adaptation (LoRA) is one of the most popular task-specific parameter-efficient fine-tuning (PEFT) methods on pre-trained language models for its good performance and computational efficiency. LoRA injects a product of two trainable rank decomposition matrices over the top of each frozen pre-trained model module. However, when applied in the setting of privacy-preserving federated learning (FL), LoRA may become unstable due to the following facts: 1) the effects of data heterogeneity and multi-step local updates are non-negligible, 2) additive noise enforced on updating gradients to guarantee differential privacy (DP) can be amplified and 3) the final performance is susceptible to hyper-parameters. A key factor leading to these phenomena is the discordance between jointly optimizing the two low-rank matrices by local clients and separately aggregating them by the central server. Thus, this paper proposes an efficient and effective version of LoRA, Federated Freeze A LoRA (FFA-LoRA), to alleviate these challenges and further halve the communication cost of federated fine-tuning LLMs. The core idea of FFA-LoRA is to fix the randomly initialized non-zero matrices and only fine-tune the zero-initialized matrices. Compared to LoRA, FFA-LoRA is motivated by practical and theoretical benefits in privacy-preserved FL. Our experiments demonstrate that FFA-LoRA provides more consistent performance with better computational efficiency over vanilla LoRA in various FL tasks.
Paper Structure (18 sections, 3 theorems, 14 equations, 1 figure, 9 tables)

This paper contains 18 sections, 3 theorems, 14 equations, 1 figure, 9 tables.

Key Result

Theorem 1

For local updates with the same initial condition on $\mathbf{W}$, vanilla LoRA update with scaling factor $\alpha_{{{LoRA}}\xspace}$ produces trajectory $\{W_{\alpha_{{{LoRA}}\xspace}}^k\}_{k\in [K]}$, and $\textsf{FFA-LoRA}$ with scaling $\alpha_{FFA}$ produces trajectory $\{W_{\alpha_{FFA}}^k\}_{

Figures (1)

  • Figure 1: Frobenius norm of noise terms within a single update.

Theorems & Definitions (7)

  • Definition 1: $(\epsilon, \delta)$-DP
  • Theorem 1
  • Theorem 2: Smoothness conditions
  • proof
  • proof
  • Corollary 2.1: Privacy Guarantee
  • proof