Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Xiaoyu Luo; Wenrui Yu; Qiongxiu Li; Johannes Bjerva

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Xiaoyu Luo, Wenrui Yu, Qiongxiu Li, Johannes Bjerva

TL;DR

A generalized probabilistic extraction framework is proposed that unifies prefix-conditioned decoding and diffusion-based generation under arbitrary masking patterns and stochastic sampling trajectories and demonstrates that DLMs exhibit substantially lower memorization-based leakage of personally identifiable information (PII) compared to ARMs.

Abstract

Autoregressive language models (ARMs) have been shown to memorize and occasionally reproduce training data verbatim, raising concerns about privacy and copyright liability. Diffusion language models (DLMs) have recently emerged as a competitive alternative, yet their memorization behavior remains largely unexplored due to fundamental differences in generation dynamics. To address this gap, we present a systematic theoretical and empirical characterization of memorization in DLMs. We propose a generalized probabilistic extraction framework that unifies prefix-conditioned decoding and diffusion-based generation under arbitrary masking patterns and stochastic sampling trajectories. Theorem 4.3 establishes a monotonic relationship between sampling resolution and memorization: increasing resolution strictly increases the probability of exact training data extraction, implying that autoregressive decoding corresponds to a limiting case of diffusion-based generation by setting the sampling resolution maximal. Extensive experiments across model scales and sampling strategies validate our theoretical predictions. Under aligned prefix-conditioned evaluations, we further demonstrate that DLMs exhibit substantially lower memorization-based leakage of personally identifiable information (PII) compared to ARMs.

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 21 equations, 9 figures, 2 tables)

This paper contains 39 sections, 2 theorems, 21 equations, 9 figures, 2 tables.

Introduction
Related Work
Language Model Architecture
The Autoregressive Paradigm.
Masked Diffusion Paradigm.
Large Diffusion Language Models.
Language Model Memorization
Preliminaries
Masked Diffusion Model
Autoregressive Model
Sampling Paradigm
Temperature Sampling.
Top-$k$ sampling.
Gumbel noise perturbation.
$(n,p)$-discoverable Extraction Framework
...and 24 more sections

Key Result

Theorem 4.3

Under Assumption ass.mono, for diffusion-based LLMs, the probability of generating a sequence that exactly matches a training instance generally increases with the number of sampling steps $N$. In certain cases, this relationship is theoretically monotonic under a fixed recovery sequence. Let $\sigm

Figures (9)

Figure 1: A randomly sampled verbatim PII memorization example from diffusion language model (LLaDA-8B). This case shows that finer-grained generation resolution (i.e., more steps) is more prone to verbatim recovery of training data. $\langle$M$\rangle$ denotes a mask token; blue highlights matched strings, and red highlights mismatches.
Figure 2: Empirical vs. theoretical memorization probability measured by DLM-1.1B on 200 SlimPajama examples. Empirical $p_{\boldsymbol{z}}$ is estimated from 100,000 random-decoding generations; theoretical $\hat{p}_{\boldsymbol{z}}$ is estimated from 1,024 mask-patterns at a fixed mask ratio ($[0.2,0.3]$).
Figure 3: Empirical log-probability ratio across decoding steps (DLM-1.1B). We sample 200 mask-patterns and, for each, run $10{,}000$ random decoding generations to estimate success probabilities at steps $\{1,2,5,10,\text{per-token}\}$. We plot the median log-probability difference to one-step (baseline), with $95\%$ bootstrap Confidence Intervals (CIs) across generation-trajectories.
Figure 4: Per-token-step email hit rates under different model scales (170M/629M/1.1B). Each bar reports the Extractable rate over $3{,}000$ samples, measured as whether an email can be recovered within a querying budget $n$, at fixed $p=0.5$.
Figure 5: Validating memorization of training data. We compare the reconstruction likelihood from the Enron (train) against the unseen from the TREC 2007 Spam (test), which is from the same domain.
...and 4 more figures

Theorems & Definitions (8)

Definition 3.1: $(n,p)$-discoverable Extraction
Definition 3.2: $(\epsilon, n, p)$-discoverable Extraction
Definition 4.1: Generalized $(n,p)$-discoverable Extraction
Theorem 4.3: Impact of Sampling Resolution on Memorization
proof
Proposition 4.4
Definition 4.5: Generalized $(\epsilon,n,p)$-discoverable Extraction
proof

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

TL;DR

Abstract

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)