How much do language models memorize?

John X. Morris; Chawin Sitawarin; Chuan Guo; Narine Kokhlikyan; G. Edward Suh; Alexander M. Rush; Kamalika Chaudhuri; Saeed Mahloujifar

How much do language models memorize?

John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Alexander M. Rush, Kamalika Chaudhuri, Saeed Mahloujifar

TL;DR

This paper introduces a principled framework to quantify how much language models memorize by separating unintended memorization from generalization. It leverages Kolmogorov-based notions and compression via arithmetic coding to estimate memorization capacity, demonstrating that GPT-style transformers store about $3.6$ bits per parameter and that capacity saturates before dataset size causes grokking. Through synthetic and real-text experiments, it reveals a double-descent phenomenon when data size surpasses model capacity and derives scaling laws predicting membership-inference performance from capacity and data. The work provides practical insights into memory, generalization, and privacy considerations for large transformers, and offers guidance for data curation and model evaluation.

Abstract

We propose a new method for estimating how much a model knows about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from generalization. We formally separate memorization into two components: unintended memorization, the information a model contains about a specific dataset, and generalization, the information a model contains about the true data-generation process. When we completely eliminate generalization, we can compute the total memorization, which provides an estimate of model capacity: our measurements estimate that GPT-style models have a capacity of approximately 3.6 bits per parameter. We train language models on datasets of increasing size and observe that models memorize until their capacity fills, at which point "grokking" begins, and unintended memorization decreases as models begin to generalize. We train hundreds of transformer language models ranging from $500K$ to $1.5B$ parameters and produce a series of scaling laws relating model capacity and data size to membership inference.

How much do language models memorize?

TL;DR

Abstract

How much do language models memorize?

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (8)