Table of Contents
Fetching ...

Skewed Memorization in Large Language Models: Quantification and Decomposition

Hao Li, Di Huang, Ziyu Wang, Amir M. Rahmani

TL;DR

This work addresses the privacy and security risks of memorization in large language models trained via supervised fine-tuning, showing that memorization is highly skewed and concentrated in a small subset of data. It introduces a prefix continuation framework and non-parametric sampling to quantify memorization, linking it to training dynamics and embedding-space structure. Through experiments on domain-specific and open-domain datasets using LoRA-fine-tuned Llama-3.1-8b-Instruct, the authors demonstrate that memorization intensifies with training and is strongly influenced by dataset composition and embedding density, while standard average metrics fail to capture worst-case cases. The findings inform practical detection and mitigation strategies to advance privacy-preserving LLM training and data curation.

Abstract

Memorization in Large Language Models (LLMs) poses privacy and security risks, as models may unintentionally reproduce sensitive or copyrighted data. Existing analyses focus on average-case scenarios, often neglecting the highly skewed distribution of memorization. This paper examines memorization in LLM supervised fine-tuning (SFT), exploring its relationships with training duration, dataset size, and inter-sample similarity. By analyzing memorization probabilities over sequence lengths, we link this skewness to the token generation process, offering insights for estimating memorization and comparing it to established metrics. Through theoretical analysis and empirical evaluation, we provide a comprehensive understanding of memorization behaviors and propose strategies to detect and mitigate risks, contributing to more privacy-preserving LLMs.

Skewed Memorization in Large Language Models: Quantification and Decomposition

TL;DR

This work addresses the privacy and security risks of memorization in large language models trained via supervised fine-tuning, showing that memorization is highly skewed and concentrated in a small subset of data. It introduces a prefix continuation framework and non-parametric sampling to quantify memorization, linking it to training dynamics and embedding-space structure. Through experiments on domain-specific and open-domain datasets using LoRA-fine-tuned Llama-3.1-8b-Instruct, the authors demonstrate that memorization intensifies with training and is strongly influenced by dataset composition and embedding density, while standard average metrics fail to capture worst-case cases. The findings inform practical detection and mitigation strategies to advance privacy-preserving LLM training and data curation.

Abstract

Memorization in Large Language Models (LLMs) poses privacy and security risks, as models may unintentionally reproduce sensitive or copyrighted data. Existing analyses focus on average-case scenarios, often neglecting the highly skewed distribution of memorization. This paper examines memorization in LLM supervised fine-tuning (SFT), exploring its relationships with training duration, dataset size, and inter-sample similarity. By analyzing memorization probabilities over sequence lengths, we link this skewness to the token generation process, offering insights for estimating memorization and comparing it to established metrics. Through theoretical analysis and empirical evaluation, we provide a comprehensive understanding of memorization behaviors and propose strategies to detect and mitigate risks, contributing to more privacy-preserving LLMs.

Paper Structure

This paper contains 31 sections, 4 theorems, 37 equations, 8 figures, 4 tables.

Key Result

Theorem 2.2

The maximum fraction of memorization no shorter than $n$ is achieved by the Bayes Optimal Classifier: with expected Bayesian risk:

Figures (8)

  • Figure 1: Memorization trends across training epochs in Lavita. Maximum and high-percentile memorization increase as loss decreases, but extreme cases appear early in training.
  • Figure 2: Memorization trends in the mixed dataset. While memorization patterns remain skewed, the overall distribution differs from Lavita, highlighting dataset-dependent memorization effects.
  • Figure 3: Memorization comparison for the same subset of samples trained in Lavita vs. the mixed dataset. Certain data points exhibit large memorization differences across contexts, emphasizing dataset-dependent effects.
  • Figure 4: Memorization trends across smaller Lavita subsets. Despite fewer training steps per epoch, models rapidly reach high memorization levels, increasing risks for small-scale fine-tuning.
  • Figure 5: Probability of missing high-memorization instances as a function of sample size. Smaller samples fail to capture the distribution’s upper tail, leading to systematic underestimation of extreme cases.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Remark 2.1
  • Theorem 2.2
  • Definition 2.3
  • Corollary 2.4
  • Theorem 2.5
  • Theorem 2.6
  • Remark 2.7
  • proof
  • proof
  • proof
  • ...and 3 more