Causal Estimation of Memorisation Profiles

Pietro Lesci; Clara Meister; Thomas Hofmann; Andreas Vlachos; Tiago Pimentel

Causal Estimation of Memorisation Profiles

Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel

TL;DR

This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics, which characterise a model's memorisation profile--its memorisation trends across training--by only observing its behaviour on a small set of instances throughout training.

Abstract

Understanding memorisation in language models has practical and societal implications, e.g., studying models' training dynamics or preventing copyright infringements. Prior work defines memorisation as the causal effect of training with an instance on the model's ability to predict that instance. This definition relies on a counterfactual: the ability to observe what would have happened had the model not seen that instance. Existing methods struggle to provide computationally efficient and accurate estimates of this counterfactual. Further, they often estimate memorisation for a model architecture rather than for a specific model instance. This paper fills an important gap in the literature, proposing a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics. Using this method, we characterise a model's memorisation profile--its memorisation trends across training--by only observing its behaviour on a small set of instances throughout training. In experiments with the Pythia model suite, we find that memorisation (i) is stronger and more persistent in larger models, (ii) is determined by data order and learning rate, and (iii) has stable trends across model sizes, thus making memorisation in larger models predictable from smaller ones.

Causal Estimation of Memorisation Profiles

TL;DR

Abstract

Paper Structure (14 sections, 15 equations, 1 figure)

This paper contains 14 sections, 15 equations, 1 figure.

Introduction
Background
Language Modelling
Causal Analysis
Counterfactual Memorisation
Estimating Memorisation
The Difference Estimator
The Difference-in-Differences Estimator
Prior Notions of Memorisation
Previous Operationalisations of Counterfactual Memorisation
Influence Functions
Extractable Memorisation
Experiments
The Pythia Suite.

Figures (1)

Figure 1: Memorisation profile (top) and path (bottom) of Pythia [mode=math]6.9. Each entry represents the expected counterfactual memorisation of instances trained on at a specific timestep ("Treatment Step") across model checkpoints ("Checkpoint Step"). The dashed vertical line indicates the end of the first epoch.

Theorems & Definitions (6)

Definition 1
Definition 2
Definition 3
proof
proof
Definition 4

Causal Estimation of Memorisation Profiles

TL;DR

Abstract

Causal Estimation of Memorisation Profiles

Authors

TL;DR

Abstract

Table of Contents

Figures (1)

Theorems & Definitions (6)