Table of Contents
Fetching ...

On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

Qian Sun, Hanpeng Wu, Xi Sheryl Zhang

TL;DR

This research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process.

Abstract

The pretraining and fine-tuning approach has become the leading technique for various NLP applications. However, recent studies reveal that fine-tuning data, due to their sensitive nature, domain-specific characteristics, and identifiability, pose significant privacy concerns. To help develop more privacy-resilient fine-tuning models, we introduce a novel active privacy auditing framework, dubbed Parsing, designed to identify and quantify privacy leakage risks during the supervised fine-tuning (SFT) of language models (LMs). The framework leverages improved white-box membership inference attacks (MIAs) as the core technology, utilizing novel learning objectives and a two-stage pipeline to monitor the privacy of the LMs' fine-tuning process, maximizing the exposure of privacy risks. Additionally, we have improved the effectiveness of MIAs on large LMs including GPT-2, Llama2, and certain variants of them. Our research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process. Experimental results confirm the framework's efficiency across various models and tasks, emphasizing notable privacy concerns in the fine-tuning process. Project code available for https://anonymous.4open.science/r/PARSING-4817/.

On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models

TL;DR

This research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process.

Abstract

The pretraining and fine-tuning approach has become the leading technique for various NLP applications. However, recent studies reveal that fine-tuning data, due to their sensitive nature, domain-specific characteristics, and identifiability, pose significant privacy concerns. To help develop more privacy-resilient fine-tuning models, we introduce a novel active privacy auditing framework, dubbed Parsing, designed to identify and quantify privacy leakage risks during the supervised fine-tuning (SFT) of language models (LMs). The framework leverages improved white-box membership inference attacks (MIAs) as the core technology, utilizing novel learning objectives and a two-stage pipeline to monitor the privacy of the LMs' fine-tuning process, maximizing the exposure of privacy risks. Additionally, we have improved the effectiveness of MIAs on large LMs including GPT-2, Llama2, and certain variants of them. Our research aims to provide the SFT community of LMs with a reliable, ready-to-use privacy auditing tool, and to offer valuable insights into safeguarding privacy during the fine-tuning process. Experimental results confirm the framework's efficiency across various models and tasks, emphasizing notable privacy concerns in the fine-tuning process. Project code available for https://anonymous.4open.science/r/PARSING-4817/.

Paper Structure

This paper contains 33 sections, 13 equations, 13 figures, 20 tables.

Figures (13)

  • Figure 1: An example of privacy auditing results for fine-tuning a model on different tasks. Quantify the level of privacy leakage using two carefully designed metrics.
  • Figure 2: A comprehensive breakdown of the auditing framework Parsing embedded in the model fine-tuning process based on white-box MIAs, including data partitioning, property extraction, property embedding, and membership inference.
  • Figure 3: Balance accuracy variation curve of the audit across three model sizes on PubMed-RCT dataset over 40 fine-tuning epochs (from top to bottom: GPT-2-medium, GPT-2-large, GPT-2-xl)
  • Figure 4: Log-scale ROC curves for the audit across three model sizes on PubMed-RCT dataset at various fine-tuning epochs (from top to bottom: GPT-2-medium, GPT-2-large, GPT-2-xl)
  • Figure 5: The balance audit accuracy with different batchsizes for GPT-2-medium trained on PubMed_RCT and Sentiment140.
  • ...and 8 more figures