Table of Contents
Fetching ...

Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper

Shotaro Ishihara, Hiromu Takahashi

TL;DR

This study quantifies memorization and trains-data detection in Japanese domain-specific PLMs by pre-training GPT-2 models on a limited Japanese newspaper corpus from Nikkei and by constructing paywall-based evaluation splits. It confirms that memorization correlates with training data duplication, model size, and prompt length, mirroring English-language findings, and demonstrates that membership inference can detect training data in Japanese with reasonable effectiveness (AUC around 0.6). The authors provide a framework and metrics for quantification (eidetic and approximate memorization) and detection (Min-$k\%$ Prob), applying them to both domain-specific and general GPT-2 baselines and performing qualitative analyses of memorized samples. The work highlights potential privacy and copyright risks in domain-specific PLMs and discusses limitations, including dataset accessibility, the need for larger-scale studies, decoding strategies, and mitigation measures. Overall, the paper advances cross-linguistic understanding of memorization in PLMs and offers a practical methodology for evaluating and mitigating training-data leakage in non-English domains.

Abstract

Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ''copy and paste'' on a large scale.

Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper

TL;DR

This study quantifies memorization and trains-data detection in Japanese domain-specific PLMs by pre-training GPT-2 models on a limited Japanese newspaper corpus from Nikkei and by constructing paywall-based evaluation splits. It confirms that memorization correlates with training data duplication, model size, and prompt length, mirroring English-language findings, and demonstrates that membership inference can detect training data in Japanese with reasonable effectiveness (AUC around 0.6). The authors provide a framework and metrics for quantification (eidetic and approximate memorization) and detection (Min- Prob), applying them to both domain-specific and general GPT-2 baselines and performing qualitative analyses of memorized samples. The work highlights potential privacy and copyright risks in domain-specific PLMs and discusses limitations, including dataset accessibility, the need for larger-scale studies, decoding strategies, and mitigation measures. Overall, the paper advances cross-linguistic understanding of memorization in PLMs and offers a practical methodology for evaluating and mitigating training-data leakage in non-English domains.

Abstract

Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ''copy and paste'' on a large scale.
Paper Structure (45 sections, 5 figures, 6 tables)

This paper contains 45 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The existing method for constructing an evaluation set for quantifying memorization and detecting training data. This procedure requires sampling data from the training set used to pre-train and splitting the text into prompts and references. Positive examples are created from training data and negative examples from new text that are guaranteed not to be training data.
  • Figure 2: The procedure of quantifying the memorization and training data detection of PLMs in this study. First, we pre-trained GPT-2 models using newspaper articles as a training set. We then generated strings using the public part as a prompt. The memorization was quantified using the private part. We also tackle the training data detection task, using articles used for pre-training as positive examples and not as negative examples.
  • Figure 3: Histogram of the number of characters in the public part in the evaluation set. Most articles are around 200 words, but some are shorter.
  • Figure 4: Histogram of the number of characters up to the end of the first sentence in the private part of the evaluation set. Nine articles exceeded 200 characters and were therefore skipped in the visualization.
  • Figure 5: Visualization of the average value of approximate memorization. Similar results were confirmed for other metrics.