Table of Contents
Fetching ...

Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

Jin Liu, Yinbin Miao, Ning Xi, Junkang Liu

TL;DR

LA-LoRA is proposed, a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints and strengthens convergence guarantees in noisy federated environments.

Abstract

Fine-tuning large vision models (LVMs) and large language models (LLMs) under differentially private federated learning (DPFL) is hindered by a fundamental privacy-utility trade-off. Low-Rank Adaptation (LoRA), a promising parameter-efficient fine-tuning (PEFT) method, reduces computational and communication costs by introducing two trainable low-rank matrices while freezing pre-trained weights. However, directly applying LoRA in DPFL settings leads to performance degradation, especially in LVMs. Our analysis reveals three previously underexplored challenges: (1) gradient coupling caused by the simultaneous update of two asymmetric low-rank matrices, (2) compounded noise amplification under differential privacy, and (3) sharpness of the global aggregated model in the parameter space. To address these issues, we propose LA-LoRA (\textbf{L}ocal \textbf{A}lternating \textbf{LoRA}), a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints. Theoretically, LA-LoRA strengthens convergence guarantees in noisy federated environments. Extensive experiments demonstrate that LA-LoRA achieves state-of-the-art (SOTA) performance on Swin Transformer and RoBERTa models, showcasing robustness to DP noise and broad applicability across both LVMs and LLMs. For example, when fine-tuning the Swin-B model on the Tiny-ImageNet dataset under a strict privacy budget ($ε= 1$), LA-LoRA outperforms the best baseline, RoLoRA, by 16.83\% in test accuracy. Code is provided in \repolink.

Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

TL;DR

LA-LoRA is proposed, a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints and strengthens convergence guarantees in noisy federated environments.

Abstract

Fine-tuning large vision models (LVMs) and large language models (LLMs) under differentially private federated learning (DPFL) is hindered by a fundamental privacy-utility trade-off. Low-Rank Adaptation (LoRA), a promising parameter-efficient fine-tuning (PEFT) method, reduces computational and communication costs by introducing two trainable low-rank matrices while freezing pre-trained weights. However, directly applying LoRA in DPFL settings leads to performance degradation, especially in LVMs. Our analysis reveals three previously underexplored challenges: (1) gradient coupling caused by the simultaneous update of two asymmetric low-rank matrices, (2) compounded noise amplification under differential privacy, and (3) sharpness of the global aggregated model in the parameter space. To address these issues, we propose LA-LoRA (\textbf{L}ocal \textbf{A}lternating \textbf{LoRA}), a novel approach that decouples gradient interactions and aligns update directions across clients to enhance robustness under stringent privacy constraints. Theoretically, LA-LoRA strengthens convergence guarantees in noisy federated environments. Extensive experiments demonstrate that LA-LoRA achieves state-of-the-art (SOTA) performance on Swin Transformer and RoBERTa models, showcasing robustness to DP noise and broad applicability across both LVMs and LLMs. For example, when fine-tuning the Swin-B model on the Tiny-ImageNet dataset under a strict privacy budget (), LA-LoRA outperforms the best baseline, RoLoRA, by 16.83\% in test accuracy. Code is provided in \repolink.
Paper Structure (58 sections, 9 theorems, 66 equations, 11 figures, 20 tables)

This paper contains 58 sections, 9 theorems, 66 equations, 11 figures, 20 tables.

Key Result

Theorem 1

Following the privacy analysis of noble2022differentially, LA-LoRA ensures that after $T$ communication rounds with $K$ local steps per client, the weight matrix $W^T$ satisfies $(\epsilon, \delta)$-DP towards any third party: With respect to the server, after $T$ rounds, the accumulated privacy budget satisfies $(\epsilon_s, \delta_s)$-DP,

Figures (11)

  • Figure 1: The illustration of DP-LoRA, FFA-LoRA, RoLoRA, and LA-LORA. DP-LoRA updates both noisy $A$ and $B$ simultaneously and sends them to the server for aggregation. FFA-LoRA freezes $A$, updates only the noisy $B$, and sends it to the server. RoLoRA alternately updates noisy $A$ and $B$ across rounds. Our LA-LoRA alternately updates noisy $A$ and $B$ within each local round.
  • Figure 2: Comparison of cosine similarity between $\nabla_A\mathcal{L}$ and $\nabla_B\mathcal{L}$, test loss and test accuracy for Swin-T on CIFAR-100 ($\epsilon=3$). LA-LoRA(-filter) uses local alternating updates without smoothing.
  • Figure 3: Scaling of perturbation Frobenius norms with $\sigma$ on QNLI.
  • Figure 4: Comparison of global loss landscapes for fine-tuning Swin-T model on CIFAR-100.
  • Figure 5: Our LA-LoRA framework.
  • ...and 6 more figures

Theorems & Definitions (18)

  • Definition 1: $(\epsilon, \delta)$-DP, dwork2014algorithmic
  • Theorem 1: Privacy guarantee
  • Theorem 2: Closed-form projected gradients
  • Theorem 3: Stable feature learning
  • Theorem 4: Convergence rate
  • Definition 2: Rényi DP
  • proof
  • proof
  • Definition 3
  • Lemma 1
  • ...and 8 more