Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions

Hao Du; Shang Liu; Lele Zheng; Yang Cao; Atsuyoshi Nakamura; Lei Chen

Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions

Hao Du, Shang Liu, Lele Zheng, Yang Cao, Atsuyoshi Nakamura, Lei Chen

TL;DR

This paper surveys privacy risks unique to the fine-tuning of large language models, focusing on data-centered attacks (membership inference and data extraction) and outputs manipulation (backdoors). It analyzes how different fine-tuning paradigms, especially full fine-tuning versus parameter-efficient fine-tuning (PEFT), interact with privacy threats and defenses, including differential privacy, federated learning, and knowledge unlearning. The authors identify gaps in understanding PEFT-specific vulnerabilities and the limitations of current defenses, and they propose directions for more robust, utility-preserving privacy techniques in fine-tuning. Overall, the work provides a focused framework for assessing privacy during fine-tuning and guides future research toward responsible deployment of LLMs in sensitive domains.

Abstract

Fine-tuning has emerged as a critical process in leveraging Large Language Models (LLMs) for specific downstream tasks, enabling these models to achieve state-of-the-art performance across various domains. However, the fine-tuning process often involves sensitive datasets, introducing privacy risks that exploit the unique characteristics of this stage. In this paper, we provide a comprehensive survey of privacy challenges associated with fine-tuning LLMs, highlighting vulnerabilities to various privacy attacks, including membership inference, data extraction, and backdoor attacks. We further review defense mechanisms designed to mitigate privacy risks in the fine-tuning phase, such as differential privacy, federated learning, and knowledge unlearning, discussing their effectiveness and limitations in addressing privacy risks and maintaining model utility. By identifying key gaps in existing research, we highlight challenges and propose directions to advance the development of privacy-preserving methods for fine-tuning LLMs, promoting their responsible use in diverse applications.

Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions

TL;DR

Abstract

Privacy in Fine-tuning Large Language Models: Attacks, Defenses, and Future Directions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)

Theorems & Definitions (3)