Sequence-Level Leakage Risk of Training Data in Large Language Models

Trishita Tiwari; G. Edward Suh

Sequence-Level Leakage Risk of Training Data in Large Language Models

Trishita Tiwari, G. Edward Suh

TL;DR

The paper introduces a probabilistic, sequence-level framework for quantifying training-data leakage in large language models, addressing limitations of prior extraction-rate metrics and deterministic decoding. It defines per-sequence metrics such as Exact Sample Probability (ESP) and inexact variants (n-ISP) to capture leakage risk under randomized decoding and multi-query adversaries. Using Llama and OPT trained on Common Crawl and The Pile, the study shows that extraction-rate underestimates leakage by up to 2.14× and that a substantial fraction of sequences leak more easily with smaller models or shorter prefixes; partial leakage is not generally more likely than verbatim leakage. The work argues for widespread adoption of sequence-level probabilistic metrics to better quantify memorization risks and guide privacy-conscious model development and deployment.

Abstract

This work quantifies the risk of training data leakage from LLMs (Large Language Models) using sequence-level probabilities. Computing extraction probabilities for individual sequences provides finer-grained information than has been studied in prior benchmarking work. We re-analyze the effects of decoding schemes, model sizes, prefix lengths, partial sequence leakages, and token positions to uncover new insights that were not possible in previous works due to their choice of metrics. We perform this study on two pre-trained models, Llama and OPT, trained on the Common Crawl and The Pile respectively. We discover that 1) Extraction Rate, the predominant metric used in prior quantification work, underestimates the threat of leakage of training data in randomized LLMs by as much as 2.14X. 2) Although on average, larger models and longer prefixes can extract more data, this is not true for a substantial portion of individual sequences. 30.4-41.5% of our sequences are easier to extract with either shorter prefixes or smaller models. 3) Contrary to previous beliefs, partial leakage in commonly used decoding schemes like top-k and top-p is not easier than leaking verbatim training data. The aim of this work is to encourage the adoption of this metric for future work on quantification of training data extraction.

Sequence-Level Leakage Risk of Training Data in Large Language Models

TL;DR

Abstract

Sequence-Level Leakage Risk of Training Data in Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)

Theorems & Definitions (5)