Table of Contents
Fetching ...

Many-Shot Regurgitation (MSR) Prompting

Shashank Sonkar, Richard G. Baraniuk

TL;DR

This work introduces Many-Shot Regurgitation (MSR) prompting, a black-box membership inference framework to quantify verbatim content reproduction in large language models. By segmenting source text and shaping a simulated multi-round prompt, MSR isolates a genuine final output $T_n'$ and evaluates verbatim matches to the original segment using the Longest Common Substring, followed by distributional comparisons between $D_{ m pre}$ (likely seen during training) and $D_{ m post}$ (post-cutoff). Across Wikipedia and OER textbooks and three state-of-the-art models, the study finds significantly higher verbatim regurgitation when prompts use data from training sources, with strong effect sizes and KS distances; ablations show that increasing shot count and lower temperatures amplify reproduction, while shorter input lengths constrain it due to the $L/s$ bound. The findings advance understanding of training data provenance, copyright risk, and the feasibility of black-box data-detection techniques for deployed LLMs.

Abstract

We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs). MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitation. We apply MSR prompting to diverse text sources, including Wikipedia articles and open educational resources (OER) textbooks, which provide high-quality, factual content and are continuously updated over time. For each source, we curate two dataset types: one that LLMs were likely exposed to during training ($D_{\rm pre}$) and another consisting of documents published after the models' training cutoff dates ($D_{\rm post}$). To quantify the occurrence of verbatim matches, we employ the Longest Common Substring algorithm and count the frequency of matches at different length thresholds. We then use statistical measures such as Cliff's delta, Kolmogorov-Smirnov (KS) distance, and Kruskal-Wallis H test to determine whether the distribution of verbatim matches differs significantly between $D_{\rm pre}$ and $D_{\rm post}$. Our findings reveal a striking difference in the distribution of verbatim matches between $D_{\rm pre}$ and $D_{\rm post}$, with the frequency of verbatim reproduction being significantly higher when LLMs (e.g. GPT models and LLaMAs) are prompted with text from datasets they were likely trained on. For instance, when using GPT-3.5 on Wikipedia articles, we observe a substantial effect size (Cliff's delta $= -0.984$) and a large KS distance ($0.875$) between the distributions of $D_{\rm pre}$ and $D_{\rm post}$. Our results provide compelling evidence that LLMs are more prone to reproducing verbatim content when the input text is likely sourced from their training data.

Many-Shot Regurgitation (MSR) Prompting

TL;DR

This work introduces Many-Shot Regurgitation (MSR) prompting, a black-box membership inference framework to quantify verbatim content reproduction in large language models. By segmenting source text and shaping a simulated multi-round prompt, MSR isolates a genuine final output and evaluates verbatim matches to the original segment using the Longest Common Substring, followed by distributional comparisons between (likely seen during training) and (post-cutoff). Across Wikipedia and OER textbooks and three state-of-the-art models, the study finds significantly higher verbatim regurgitation when prompts use data from training sources, with strong effect sizes and KS distances; ablations show that increasing shot count and lower temperatures amplify reproduction, while shorter input lengths constrain it due to the bound. The findings advance understanding of training data provenance, copyright risk, and the feasibility of black-box data-detection techniques for deployed LLMs.

Abstract

We introduce Many-Shot Regurgitation (MSR) prompting, a new black-box membership inference attack framework for examining verbatim content reproduction in large language models (LLMs). MSR prompting involves dividing the input text into multiple segments and creating a single prompt that includes a series of faux conversation rounds between a user and a language model to elicit verbatim regurgitation. We apply MSR prompting to diverse text sources, including Wikipedia articles and open educational resources (OER) textbooks, which provide high-quality, factual content and are continuously updated over time. For each source, we curate two dataset types: one that LLMs were likely exposed to during training () and another consisting of documents published after the models' training cutoff dates (). To quantify the occurrence of verbatim matches, we employ the Longest Common Substring algorithm and count the frequency of matches at different length thresholds. We then use statistical measures such as Cliff's delta, Kolmogorov-Smirnov (KS) distance, and Kruskal-Wallis H test to determine whether the distribution of verbatim matches differs significantly between and . Our findings reveal a striking difference in the distribution of verbatim matches between and , with the frequency of verbatim reproduction being significantly higher when LLMs (e.g. GPT models and LLaMAs) are prompted with text from datasets they were likely trained on. For instance, when using GPT-3.5 on Wikipedia articles, we observe a substantial effect size (Cliff's delta ) and a large KS distance () between the distributions of and . Our results provide compelling evidence that LLMs are more prone to reproducing verbatim content when the input text is likely sourced from their training data.
Paper Structure (15 sections, 1 equation, 4 figures, 2 tables)

This paper contains 15 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Distribution of verbatim match frequencies for different language models on the Wikipedia dataset. The results show a consistent pattern across all three models (GPT-3.5, GPT-4, and LLAMA), where the frequency of verbatim matches is notably higher for articles published before the training cutoff dates (Wikimedia) compared to those published after (April 2024 Wiki). This difference is particularly evident for longer verbatim match lengths, with the Wikimedia frequencies maintaining higher values even at substring lengths of 10 words or more. In contrast, the April Wiki frequencies exhibit a sharper decline as the verbatim match length increases. Statistical tests (Cliff's Delta, KS Distance, and Kruskal-Wallis H Test) confirm the significance of these differences, suggesting that the language models are more prone to reproducing verbatim content when the input text is sourced from their training data.
  • Figure 2: Distribution of verbatim match frequencies for GPT-3.5 and LLAMA on the OER textbook dataset. The frequency of verbatim matches is consistently higher for the textbooks published before the training cutoff dates (Old Bio & Econ) compared to the recently released textbooks (New Pharma & Nursing). Statistical tests (Cliff's Delta, KS Distance, and Kruskal-Wallis H Test) confirm the significance of these differences.
  • Figure 3: Effect of the number of shots on the occurrence of verbatim regurgitation for GPT-3.5 and LLAMA models on the Wikipedia dataset. The results show that increasing the number of shots generally leads to higher frequencies of verbatim matches, with 6 shots (3 human and 3 assistant turns) yielding the best results.
  • Figure 4: Distribution of verbatim match frequencies for GPT-3.5 at different input text lengths on the Wikipedia dataset. The results demonstrate that MSR prompting consistently yields higher verbatim match frequencies for articles published before the training cutoff dates (Wikimedia) compared to those published after (April 2024 Wiki), even at shorter input text lengths. However, as the input text length decreases, the maximum verbatim match length that can be meaningfully analyzed is limited by the ratio of the input text length ($L$) to the number of splits ($s$), approximated as $L/s$, resulting in a more sparse distribution of verbatim match frequencies.