Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty, Suzan Verberne, Fatih Turkmen
TL;DR
This survey addresses the privacy and security risks of memorization in large language models, proposing a three-dimensional taxonomy across granularity, retrievability, and desirability. It surveys measurement approaches (string matching, exposure, inference attacks, counterfactuality, and prompt compression), identifies drivers (model size, data characteristics, prompts, and decoding), and reviews mitigation strategies (data deduplication, differential privacy, unlearning, and heuristics). The authors articulate a comprehensive research agenda, highlighting future work on balancing privacy with performance, distinguishing memorization from understanding, and examining memorization in conversational agents, RAG systems, multilingual models, and diffusion-language-model contexts. The work aims to guide researchers and practitioners in auditing, defending, and responsibly deploying LLMs with respect to memorization risks.
Abstract
While recent research increasingly showcases the remarkable capabilities of Large Language Models (LLMs), it is equally crucial to examine their associated risks. Among these, privacy and security vulnerabilities are particularly concerning, posing significant ethical and legal challenges. At the heart of these vulnerabilities stands memorization, which refers to a model's tendency to store and reproduce phrases from its training data. This phenomenon has been shown to be a fundamental source to various privacy and security attacks against LLMs. In this paper, we provide a taxonomy of the literature on LLM memorization, exploring it across three dimensions: granularity, retrievability, and desirability. Next, we discuss the metrics and methods used to quantify memorization, followed by an analysis of the causes and factors that contribute to memorization phenomenon. We then explore strategies that are used so far to mitigate the undesirable aspects of this phenomenon. We conclude our survey by identifying potential research topics for the near future, including methods to balance privacy and performance, and the analysis of memorization in specific LLM contexts such as conversational agents, retrieval-augmented generation, and diffusion language models. Given the rapid research pace in this field, we also maintain a dedicated repository of the references discussed in this survey which will be regularly updated to reflect the latest developments.
