Table of Contents
Fetching ...

Sequence-Level Leakage Risk of Training Data in Large Language Models

Trishita Tiwari, G. Edward Suh

TL;DR

The paper introduces a probabilistic, sequence-level framework for quantifying training-data leakage in large language models, addressing limitations of prior extraction-rate metrics and deterministic decoding. It defines per-sequence metrics such as Exact Sample Probability (ESP) and inexact variants (n-ISP) to capture leakage risk under randomized decoding and multi-query adversaries. Using Llama and OPT trained on Common Crawl and The Pile, the study shows that extraction-rate underestimates leakage by up to 2.14× and that a substantial fraction of sequences leak more easily with smaller models or shorter prefixes; partial leakage is not generally more likely than verbatim leakage. The work argues for widespread adoption of sequence-level probabilistic metrics to better quantify memorization risks and guide privacy-conscious model development and deployment.

Abstract

This work quantifies the risk of training data leakage from LLMs (Large Language Models) using sequence-level probabilities. Computing extraction probabilities for individual sequences provides finer-grained information than has been studied in prior benchmarking work. We re-analyze the effects of decoding schemes, model sizes, prefix lengths, partial sequence leakages, and token positions to uncover new insights that were not possible in previous works due to their choice of metrics. We perform this study on two pre-trained models, Llama and OPT, trained on the Common Crawl and The Pile respectively. We discover that 1) Extraction Rate, the predominant metric used in prior quantification work, underestimates the threat of leakage of training data in randomized LLMs by as much as 2.14X. 2) Although on average, larger models and longer prefixes can extract more data, this is not true for a substantial portion of individual sequences. 30.4-41.5% of our sequences are easier to extract with either shorter prefixes or smaller models. 3) Contrary to previous beliefs, partial leakage in commonly used decoding schemes like top-k and top-p is not easier than leaking verbatim training data. The aim of this work is to encourage the adoption of this metric for future work on quantification of training data extraction.

Sequence-Level Leakage Risk of Training Data in Large Language Models

TL;DR

The paper introduces a probabilistic, sequence-level framework for quantifying training-data leakage in large language models, addressing limitations of prior extraction-rate metrics and deterministic decoding. It defines per-sequence metrics such as Exact Sample Probability (ESP) and inexact variants (n-ISP) to capture leakage risk under randomized decoding and multi-query adversaries. Using Llama and OPT trained on Common Crawl and The Pile, the study shows that extraction-rate underestimates leakage by up to 2.14× and that a substantial fraction of sequences leak more easily with smaller models or shorter prefixes; partial leakage is not generally more likely than verbatim leakage. The work argues for widespread adoption of sequence-level probabilistic metrics to better quantify memorization risks and guide privacy-conscious model development and deployment.

Abstract

This work quantifies the risk of training data leakage from LLMs (Large Language Models) using sequence-level probabilities. Computing extraction probabilities for individual sequences provides finer-grained information than has been studied in prior benchmarking work. We re-analyze the effects of decoding schemes, model sizes, prefix lengths, partial sequence leakages, and token positions to uncover new insights that were not possible in previous works due to their choice of metrics. We perform this study on two pre-trained models, Llama and OPT, trained on the Common Crawl and The Pile respectively. We discover that 1) Extraction Rate, the predominant metric used in prior quantification work, underestimates the threat of leakage of training data in randomized LLMs by as much as 2.14X. 2) Although on average, larger models and longer prefixes can extract more data, this is not true for a substantial portion of individual sequences. 30.4-41.5% of our sequences are easier to extract with either shorter prefixes or smaller models. 3) Contrary to previous beliefs, partial leakage in commonly used decoding schemes like top-k and top-p is not easier than leaking verbatim training data. The aim of this work is to encourage the adoption of this metric for future work on quantification of training data extraction.

Paper Structure

This paper contains 34 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Approximating Inexact Sample Probabilities (Equation \ref{['eq:inex-approx']}) by only iterating over tokens in the head of the distribution. An upper bound on the error term, $\varepsilon$, can be computed by summing the probabilities in the ignored tails (see Appendix \ref{['app:isp']}).
  • Figure 2: The percentage of sequences that can be leaked for different datasets as the number of times the model is prompted per sequence increases.
  • Figure 3: The $6$ types of trends observed in ESP of different sequences as the prefix length (or model size) increases. The following is a description of the trends: a) Straight-Dec: end point is the minimum point, no intermediary max/min points b) Inverted U-shaped Dec: end point $<$ start point, but an intermediary point is the maximum, c) U-Shaped Dec: end point $<$ start point, however an intermediary point is the minimum, d) Straight Inc: end point $>$ start point, no intermediary max/min points, e) U-Shaped Inc: end point $>$ start point, an intermediary point is the minimum, f) Inverted U-Shaped Inc: end point $>$ start point, but an intermediary point is the maximum
  • Figure 4: Average Token Probability (TP) of a given token in the target suffix as a function of the position of the token.

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5