Table of Contents
Fetching ...

Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs

Kunj Joshi, David A. Smith

TL;DR

The paper tackles privacy risks from PII memorization in LLMs by proposing Randomized Masked Fine-Tuning (RMFT), which masks recurring PIIs with randomized yet plausible replacements to reduce memorization while preserving language utility. It demonstrates RMFT on the Enron dataset with GPT2-XL, reporting substantial TER and SER reductions (around 80%) and a modest perplexity increase (~5.7%), outperforming a deduplication-based baseline. To evaluate privacy-utility tradeoffs, the authors introduce MaxTER, a Pareto-frontier framework, and use Area Under the Response Curve (AURC) to compare methods across multiple datasets and scenarios. The work concludes that RMFT offers a scalable, privacy-robust fine-tuning strategy when deployment distributions align with training data, with future work needed to generalize to pretraining and less-structured PIIs.

Abstract

The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally identifying information (PIIs) from training data. We introduce Randomized Masked Fine-Tuning (RMFT), a novel privacy-preserving fine-tuning technique that reduces PII memorization while minimizing performance impact. Using the Enron Email Dataset, we demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning, outperforming deduplication methods while maintaining only a 5.73% increase in perplexity. We present MaxTER, a Pareto-optimal evaluation framework for assessing privacy-utility tradeoffs, and show the performance of RMFT vs Deduplication by Area Under The Response Curve (AURC) metric.

Randomized Masked Finetuning: An Efficient Way to Mitigate Memorization of PIIs in LLMs

TL;DR

The paper tackles privacy risks from PII memorization in LLMs by proposing Randomized Masked Fine-Tuning (RMFT), which masks recurring PIIs with randomized yet plausible replacements to reduce memorization while preserving language utility. It demonstrates RMFT on the Enron dataset with GPT2-XL, reporting substantial TER and SER reductions (around 80%) and a modest perplexity increase (~5.7%), outperforming a deduplication-based baseline. To evaluate privacy-utility tradeoffs, the authors introduce MaxTER, a Pareto-frontier framework, and use Area Under the Response Curve (AURC) to compare methods across multiple datasets and scenarios. The work concludes that RMFT offers a scalable, privacy-robust fine-tuning strategy when deployment distributions align with training data, with future work needed to generalize to pretraining and less-structured PIIs.

Abstract

The current literature on memorization in Natural Language Models, especially Large Language Models (LLMs), poses severe security and privacy risks, as models tend to memorize personally identifying information (PIIs) from training data. We introduce Randomized Masked Fine-Tuning (RMFT), a novel privacy-preserving fine-tuning technique that reduces PII memorization while minimizing performance impact. Using the Enron Email Dataset, we demonstrate that RMFT achieves an 80.81% reduction in Total Extraction Rate and 80.17% reduction in Seen Extraction Rate compared to baseline fine-tuning, outperforming deduplication methods while maintaining only a 5.73% increase in perplexity. We present MaxTER, a Pareto-optimal evaluation framework for assessing privacy-utility tradeoffs, and show the performance of RMFT vs Deduplication by Area Under The Response Curve (AURC) metric.

Paper Structure

This paper contains 30 sections, 15 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Distribution of the top ten most frequent email addresses in training and testing sets.
  • Figure 2: Methodology used for Randomized Masked Finetuning vs Deduplication. This diagram depicts how the index tables are formulated for either technique and how the updated index tables differ. If we have $N$ datapoints, $k$ emails occurring on each datapoint on average, and $u$ unique emails, then the number of times each email appears in the original index table is $Nk/u$. In RMFT, the updated index table maintains $Nk$ rows, but each email now appears only once and the number of unique emails increases. In deduplication, the updated index table has just $u$ rows with each email appearing once.
  • Figure 3: Total Extraction Rate (TER) and Seen Extraction Rate (SER) across training checkpoints for all three fine-tuning techniques. Lower values indicate fewer PIIs memorized by the model.
  • Figure 4: (a) Mean Delta Perplexity of Deduplication vs Randomized Masked Finetuning (RMFT). Lower MDP indicates less deviation from baseline performance. (b) Average Perplexity per Checkpoint for all training techniques. This curve conforms the Mean Delta Perplexity of RMFT being lower than Deduplication
  • Figure 5: MaxTER analysis identifying checkpoints that optimize the tradeoff between TER reduction and perplexity increase. RMFT outperforms deduplication by achieving greater TER reductions while incurring lower perplexity penalties. No deduplication checkpoint satisfies the constraint of perplexity increase below 20% (threshold $\tau$), representing the grey infeasible region
  • ...and 2 more figures