Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper
Shotaro Ishihara, Hiromu Takahashi
TL;DR
This study quantifies memorization and trains-data detection in Japanese domain-specific PLMs by pre-training GPT-2 models on a limited Japanese newspaper corpus from Nikkei and by constructing paywall-based evaluation splits. It confirms that memorization correlates with training data duplication, model size, and prompt length, mirroring English-language findings, and demonstrates that membership inference can detect training data in Japanese with reasonable effectiveness (AUC around 0.6). The authors provide a framework and metrics for quantification (eidetic and approximate memorization) and detection (Min-$k\%$ Prob), applying them to both domain-specific and general GPT-2 baselines and performing qualitative analyses of memorized samples. The work highlights potential privacy and copyright risks in domain-specific PLMs and discusses limitations, including dataset accessibility, the need for larger-scale studies, decoding strategies, and mitigation measures. Overall, the paper advances cross-linguistic understanding of memorization in PLMs and offers a practical methodology for evaluating and mitigating training-data leakage in non-English domains.
Abstract
Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ''copy and paste'' on a large scale.
