Table of Contents
Fetching ...

DP-MemArc: Differential Privacy Transfer Learning for Memory Efficient Language Models

Yanming Liu, Xinyue Peng, Yuwei Zhang, Xiaolan Ke, Songhang Deng, Jiannan Cao, Chen Ma, Mengchen Fu, Tianyu Du, Sheng Cheng, Xun Wang, Jianwei Yin, Xuhong Zhang

TL;DR

The paper addresses the challenge of privacy-preserving fine-tuning of large language models under tight memory constraints. It introduces DP-MemArc, a framework with two complementary designs—DP-MemArc_side and DP-MemArc_rev—that reduce training memory while maintaining differential privacy, aided by BK-MixOpt, GhostNorm, and parameter-efficient fine-tuning techniques such as LoRA and Adapters. Across RoBERTa-large and GPT-2-large tasks and privacy budgets, DP-MemArc achieves substantial memory savings (2–8x depending on configuration) with competitive accuracy and generation quality, validated through comprehensive analyses and ablations. The work offers a practical pathway to private, memory-efficient adaptation of large language models, enabling real-world deployment under strict privacy constraints.

Abstract

Large language models have repeatedly shown outstanding performance across diverse applications. However, deploying these models can inadvertently risk user privacy. The significant memory demands during training pose a major challenge in terms of resource consumption. This substantial size places a heavy load on memory resources, raising considerable practical concerns. In this paper, we introduce DP-MemArc, a novel training framework aimed at reducing the memory costs of large language models while emphasizing the protection of user data privacy. DP-MemArc incorporates side network or reversible network designs to support a variety of differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves about 2.5 times in memory optimization but also ensures robust privacy protection, keeping user data secure and confidential. Extensive experiments have demonstrated that DP-MemArc effectively provides differential privacy-efficient fine-tuning across different task scenarios.

DP-MemArc: Differential Privacy Transfer Learning for Memory Efficient Language Models

TL;DR

The paper addresses the challenge of privacy-preserving fine-tuning of large language models under tight memory constraints. It introduces DP-MemArc, a framework with two complementary designs—DP-MemArc_side and DP-MemArc_rev—that reduce training memory while maintaining differential privacy, aided by BK-MixOpt, GhostNorm, and parameter-efficient fine-tuning techniques such as LoRA and Adapters. Across RoBERTa-large and GPT-2-large tasks and privacy budgets, DP-MemArc achieves substantial memory savings (2–8x depending on configuration) with competitive accuracy and generation quality, validated through comprehensive analyses and ablations. The work offers a practical pathway to private, memory-efficient adaptation of large language models, enabling real-world deployment under strict privacy constraints.

Abstract

Large language models have repeatedly shown outstanding performance across diverse applications. However, deploying these models can inadvertently risk user privacy. The significant memory demands during training pose a major challenge in terms of resource consumption. This substantial size places a heavy load on memory resources, raising considerable practical concerns. In this paper, we introduce DP-MemArc, a novel training framework aimed at reducing the memory costs of large language models while emphasizing the protection of user data privacy. DP-MemArc incorporates side network or reversible network designs to support a variety of differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves about 2.5 times in memory optimization but also ensures robust privacy protection, keeping user data secure and confidential. Extensive experiments have demonstrated that DP-MemArc effectively provides differential privacy-efficient fine-tuning across different task scenarios.
Paper Structure (18 sections, 11 equations, 3 figures, 5 tables)

This paper contains 18 sections, 11 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Two different DP-MemArc designs, the left represents reversible network design, and the right represents side network design. The trainable parameters are fine-tuned using the differential privacy BK-MixOpt method.
  • Figure 2: Performance of different reversible network sub-function $\mathcal{F}$ design. The private constraint is $\epsilon = 8.0$.
  • Figure 3: The experiment of training steps is conducted on the E2E dataset.

Theorems & Definitions (1)

  • Definition 1