Table of Contents
Fetching ...

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Abhimanyu Hans, Yuxin Wen, Neel Jain, John Kirchenbauer, Hamid Kazemi, Prajwal Singhania, Siddharth Singh, Gowthami Somepalli, Jonas Geiping, Abhinav Bhatele, Tom Goldstein

TL;DR

This work runs extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrates significant reductions in extractable memorization with little to no impact on downstream benchmarks.

Abstract

Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, randomly sampled subsets of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

TL;DR

This work runs extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrates significant reductions in extractable memorization with little to no impact on downstream benchmarks.

Abstract

Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, randomly sampled subsets of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.
Paper Structure (31 sections, 2 equations, 10 figures, 2 tables)

This paper contains 31 sections, 2 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: A pretrained 7B model (the control) is further trained for 100 epochs on (left) the first chapter of Harry Potter or (right) 100 wikipedia documents. We observe a drop in exact match memorization and RougeL metrics when training with goldfish loss (see Section \ref{['sec:extractable-memorization']} for metric descriptions). When prompted with the opening of Harry Potter (gray) the standard model regenerates the original text (red) while the goldfish model does not.
  • Figure 2: Memorization as Function of k in Goldfish Loss: We train 1B parameter models described in Section \ref{['sec:memorization-experiment-setup']} and plot histograms of RougeL scores to measure extractable memorization. Control refers to a model not trained on the 2000 repeated wikipedia documents. We observe that for lower values of k, the extractable memorization is close to the control, and that exact repetitions observed in standard loss are effectively mitigated.
  • Figure 3: Benchmark Performance: We pretrain 1B parameter models on 20 billion tokens as described in Section \ref{['sec:memorization-experiment-setup']} and evaluate downstream performance on various benchmarks. We note only marginal change in performance for models trained with goldfish loss ($k=3$ and $k=4$) in comparison to the model trained with standard loss. Control refers to model trained only on RedPajama and not on wikipedia canaries.
  • Figure 4: Number of dropped tokens and number of divergent tokens at each sequence position for a goldfish model with $k=4$.
  • Figure 5: Validation Loss Curves During Pretraining: We measure validation loss on the RedPajamaV2 dataset as training progresses. Left: We observe validation loss as a function of input tokens seen during training. The 4-GL model trail behind the standard loss model for the same number of input tokens. Right: However, when matching the standard loss by the count of supervised tokens—i.e., the number of unmasked tokens—either by increasing the number of steps or by expanding the batch size, we observe a similar final validation loss.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Remark