Table of Contents
Fetching ...

DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

Jie Xu, Karthikeyan Saravanan, Rogier van Dalen, Haaris Mehmood, David Tuckey, Mete Ozay

TL;DR

DP-FL faces severe accuracy losses when fine-tuning large transformer models due to DP noise. The paper introduces DP-DyLoRA, a DP-compatible adaptation of DyLoRA that trains LoRA modules at a variable rank while enforcing a server-wide rank per round to preserve differential privacy. Through a comprehensive, multi-domain benchmark (NLUCV ASR) with up to 1 million clients and a privacy budget of $\epsilon=2$, DP-DyLoRA outperforms existing DP-PEFT methods, achieving less than 2% accuracy drop and 7% WER increase on average. The approach reduces communication and noise sensitivity by sharing only rank-constrained updates and provides formal DP guarantees via the Gaussian mechanism and moments accountant, enabling practical, on-device learning at scale.

Abstract

Federated learning (FL) allows clients to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern federated learning systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of $ε=2$.

DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation

TL;DR

DP-FL faces severe accuracy losses when fine-tuning large transformer models due to DP noise. The paper introduces DP-DyLoRA, a DP-compatible adaptation of DyLoRA that trains LoRA modules at a variable rank while enforcing a server-wide rank per round to preserve differential privacy. Through a comprehensive, multi-domain benchmark (NLUCV ASR) with up to 1 million clients and a privacy budget of , DP-DyLoRA outperforms existing DP-PEFT methods, achieving less than 2% accuracy drop and 7% WER increase on average. The approach reduces communication and noise sensitivity by sharing only rank-constrained updates and provides formal DP guarantees via the Gaussian mechanism and moments accountant, enabling practical, on-device learning at scale.

Abstract

Federated learning (FL) allows clients to collaboratively train a global model without sharing their local data with a server. However, clients' contributions to the server can still leak sensitive information. Differential privacy (DP) addresses such leakage by providing formal privacy guarantees, with mechanisms that add randomness to the clients' contributions. The randomness makes it infeasible to train large transformer-based models, common in modern federated learning systems. In this work, we empirically evaluate the practicality of fine-tuning large scale on-device transformer-based models with differential privacy in a federated learning system. We conduct comprehensive experiments on various system properties for tasks spanning a multitude of domains: speech recognition, computer vision (CV) and natural language understanding (NLU). Our results show that full fine-tuning under differentially private federated learning (DP-FL) generally leads to huge performance degradation which can be alleviated by reducing the dimensionality of contributions through parameter-efficient fine-tuning (PEFT). Our benchmarks of existing DP-PEFT methods show that DP-Low-Rank Adaptation (DP-LoRA) consistently outperforms other methods. An even more promising approach, DyLoRA, which makes the low rank variable, when naively combined with FL would straightforwardly break differential privacy. We therefore propose an adaptation method that can be combined with differential privacy and call it DP-DyLoRA. Finally, we are able to reduce the accuracy degradation and word error rate (WER) increase due to DP to less than 2% and 7% respectively with 1 million clients and a stringent privacy budget of .
Paper Structure (28 sections, 14 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 28 sections, 14 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Privacy-utility trade-offs of DP-LoRA and DP-DyLoRA on six datasets across three different domains under DP-FL. The utility is computed as the average of accuracy.
  • Figure 2: The optimal rank values of DP-DyLoRA for the last communication round as opposed to that of DyLoRA under non-private federated learning.
  • Figure 3: Model performance with different number of clients in production and privacy budgets. All datasets are produced using non-IID partitioning and $\alpha=0.1$ for Dirichlet distribution if applicable. CL, FL, DP-FL denote central learning, federated learning and differentially private federated learning, respectively.
  • Figure 4: Model performance with IID and non-IID data partitioning with the level of data heterogeneity being controlled by sampling from Dirichlet distribution.
  • Figure 5: Model performance with IID and non-IID data partitioning with the level of data heterogeneity being controlled by natural factors.
  • ...and 2 more figures