Skewed Memorization in Large Language Models: Quantification and Decomposition
Hao Li, Di Huang, Ziyu Wang, Amir M. Rahmani
TL;DR
This work addresses the privacy and security risks of memorization in large language models trained via supervised fine-tuning, showing that memorization is highly skewed and concentrated in a small subset of data. It introduces a prefix continuation framework and non-parametric sampling to quantify memorization, linking it to training dynamics and embedding-space structure. Through experiments on domain-specific and open-domain datasets using LoRA-fine-tuned Llama-3.1-8b-Instruct, the authors demonstrate that memorization intensifies with training and is strongly influenced by dataset composition and embedding density, while standard average metrics fail to capture worst-case cases. The findings inform practical detection and mitigation strategies to advance privacy-preserving LLM training and data curation.
Abstract
Memorization in Large Language Models (LLMs) poses privacy and security risks, as models may unintentionally reproduce sensitive or copyrighted data. Existing analyses focus on average-case scenarios, often neglecting the highly skewed distribution of memorization. This paper examines memorization in LLM supervised fine-tuning (SFT), exploring its relationships with training duration, dataset size, and inter-sample similarity. By analyzing memorization probabilities over sequence lengths, we link this skewness to the token generation process, offering insights for estimating memorization and comparing it to established metrics. Through theoretical analysis and empirical evaluation, we provide a comprehensive understanding of memorization behaviors and propose strategies to detect and mitigate risks, contributing to more privacy-preserving LLMs.
