Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

Matthieu Meeus; Shubham Jain; Marek Rei; Yves-Alexandre de Montjoye

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

Matthieu Meeus, Shubham Jain, Marek Rei, Yves-Alexandre de Montjoye

TL;DR

The paper tackles the privacy risk of document-level membership inference for large language models by formalizing the task and proposing a black-box auditing framework. It builds a dataset of member and non-member documents from sources like Project Gutenberg and ArXiv, and trains a meta-classifier on token-level probabilities that have been normalized by token rarity. The results show that document-level membership can be inferred with AUC up to 0.856 for books and 0.678 for papers, even for smaller models and under certain mitigations, underscoring substantial memorization risks and the need for data provenance and transparency. The work also demonstrates limitations of sequence-level MIAs for document-level tasks and provides actionable insights into mitigation strategies and future defenses.

Abstract

With large language models (LLMs) poised to become embedded in our daily lives, questions are starting to be raised about the data they learned from. These questions range from potential bias or misinformation LLMs could retain from their training data to questions of copyright and fair use of human-generated text. However, while these questions emerge, developers of the recent state-of-the-art LLMs become increasingly reluctant to disclose details on their training corpus. We here introduce the task of document-level membership inference for real-world LLMs, i.e. inferring whether the LLM has seen a given document during training or not. First, we propose a procedure for the development and evaluation of document-level membership inference for LLMs by leveraging commonly used data sources for training and the model release date. We then propose a practical, black-box method to predict document-level membership and instantiate it on OpenLLaMA-7B with both books and academic papers. We show our methodology to perform very well, reaching an AUC of 0.856 for books and 0.678 for papers. We then show our approach to outperform the sentence-level membership inference attacks used in the privacy literature for the document-level membership task. We further evaluate whether smaller models might be less sensitive to document-level inference and show OpenLLaMA-3B to be approximately as sensitive as OpenLLaMA-7B to our approach. Finally, we consider two mitigation strategies and find the AUC to slowly decrease when only partial documents are considered but to remain fairly high when the model precision is reduced. Taken together, our results show that accurate document-level membership can be inferred for LLMs, increasing the transparency of technology poised to change our lives.

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

TL;DR

Abstract

Paper Structure (30 sections, 10 equations, 6 figures, 5 tables)

This paper contains 30 sections, 10 equations, 6 figures, 5 tables.

Introduction
Background
Language modeling
Datasets used for training
Copyright and generative AI
Membership inference attacks
Auditing setup
Methodology
Querying the model
Normalization (Lg)
Computing a normalization dictionary
Normalization strategies
Feature extraction (Lg)
Meta-classifier
Experimental setup
...and 15 more sections

Figures (6)

Figure 1: ROC curve for the best performing membership classifier (see Tables \ref{['tab:books_primary']} and \ref{['tab:arxiv_primary']} for details). Results for books from Project Gutenberg (left) and ArXiv papers (right).
Figure 2: Querying OpenLLaMA-7B on an example from the book The Brothers Karamazov by Dostoyevsky (member, Sec. \ref{['sec:dataset_membership']}).
Figure 3: Density distribution of the original year of publication for books included in Project Gutenberg, for members and non-members. Both the raw distribution (left) and the filtered distribution for years 1850-1910 (right) are displayed.
Figure 4: Mean AUC for $C=2048$ across model sizes, for books from Project Gutenberg (left) and ArXiv papers (right).
Figure 5: Mean and standard deviation AUC for books from Project Gutenberg for an increasing number of tokens in a random excerpt sampled from the document.
...and 1 more figures

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

TL;DR

Abstract

Did the Neurons Read your Book? Document-level Membership Inference for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)