DP-MemArc: Differential Privacy Transfer Learning for Memory Efficient Language Models
Yanming Liu, Xinyue Peng, Yuwei Zhang, Xiaolan Ke, Songhang Deng, Jiannan Cao, Chen Ma, Mengchen Fu, Tianyu Du, Sheng Cheng, Xun Wang, Jianwei Yin, Xuhong Zhang
TL;DR
The paper addresses the challenge of privacy-preserving fine-tuning of large language models under tight memory constraints. It introduces DP-MemArc, a framework with two complementary designs—DP-MemArc_side and DP-MemArc_rev—that reduce training memory while maintaining differential privacy, aided by BK-MixOpt, GhostNorm, and parameter-efficient fine-tuning techniques such as LoRA and Adapters. Across RoBERTa-large and GPT-2-large tasks and privacy budgets, DP-MemArc achieves substantial memory savings (2–8x depending on configuration) with competitive accuracy and generation quality, validated through comprehensive analyses and ablations. The work offers a practical pathway to private, memory-efficient adaptation of large language models, enabling real-world deployment under strict privacy constraints.
Abstract
Large language models have repeatedly shown outstanding performance across diverse applications. However, deploying these models can inadvertently risk user privacy. The significant memory demands during training pose a major challenge in terms of resource consumption. This substantial size places a heavy load on memory resources, raising considerable practical concerns. In this paper, we introduce DP-MemArc, a novel training framework aimed at reducing the memory costs of large language models while emphasizing the protection of user data privacy. DP-MemArc incorporates side network or reversible network designs to support a variety of differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves about 2.5 times in memory optimization but also ensures robust privacy protection, keeping user data secure and confidential. Extensive experiments have demonstrated that DP-MemArc effectively provides differential privacy-efficient fine-tuning across different task scenarios.
