Table of Contents
Fetching ...

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

Jiashu Tao, Reza Shokri

TL;DR

The paper introduces InfoRMIA, an information-theoretic advancement over RMIA for membership inference attacks, delivering higher power with fewer population samples by using a continuous, bit-based score derived from a Bayes-factor-like formulation. It then extends the approach to token-level MIAs in LLMs, enabling fine-grained localization of memorization to individual tokens and providing a token-level privacy assessment interface that visualizes leakage. Empirical results show InfoRMIA outperforms RMIA on tabular, image, and text datasets and remains effective for pretrained LLMs in the MIMIR benchmark, often with only a single simple reference model. The token-level perspective reveals that privacy risk is often concentrated in private tokens and can be misrepresented by sequence-level metrics like AUC, motivating targeted unlearning and token-guided privacy safeguards.

Abstract

Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art approach is the Robust Membership Inference Attack (RMIA). In this paper, we present InfoRMIA, a principled information-theoretic formulation of membership inference. Our method consistently outperforms RMIA across benchmarks while also offering improved computational efficiency. In the second part of the paper, we identify the limitations of treating sequence-level membership inference as the gold standard for measuring leakage. We propose a new perspective for studying membership and memorization in LLMs: token-level signals and analyses. We show that a simple token-based InfoRMIA can pinpoint which tokens are memorized within generated outputs, thereby localizing leakage from the sequence level down to individual tokens, while achieving stronger sequence-level inference power on LLMs. This new scope rethinks privacy in LLMs and can lead to more targeted mitigation, such as exact unlearning.

(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

TL;DR

The paper introduces InfoRMIA, an information-theoretic advancement over RMIA for membership inference attacks, delivering higher power with fewer population samples by using a continuous, bit-based score derived from a Bayes-factor-like formulation. It then extends the approach to token-level MIAs in LLMs, enabling fine-grained localization of memorization to individual tokens and providing a token-level privacy assessment interface that visualizes leakage. Empirical results show InfoRMIA outperforms RMIA on tabular, image, and text datasets and remains effective for pretrained LLMs in the MIMIR benchmark, often with only a single simple reference model. The token-level perspective reveals that privacy risk is often concentrated in private tokens and can be misrepresented by sequence-level metrics like AUC, motivating targeted unlearning and token-guided privacy safeguards.

Abstract

Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, large language models (LLMs) are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art approach is the Robust Membership Inference Attack (RMIA). In this paper, we present InfoRMIA, a principled information-theoretic formulation of membership inference. Our method consistently outperforms RMIA across benchmarks while also offering improved computational efficiency. In the second part of the paper, we identify the limitations of treating sequence-level membership inference as the gold standard for measuring leakage. We propose a new perspective for studying membership and memorization in LLMs: token-level signals and analyses. We show that a simple token-based InfoRMIA can pinpoint which tokens are memorized within generated outputs, thereby localizing leakage from the sequence level down to individual tokens, while achieving stronger sequence-level inference power on LLMs. This new scope rethinks privacy in LLMs and can lead to more targeted mitigation, such as exact unlearning.

Paper Structure

This paper contains 37 sections, 9 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Sequence-level membership inference may not accurately identify private information leakage, which is conveyed by private tokens only.
  • Figure 2: Two of the ten most memorized sequences contain no private tokens. These pose little privacy risk, yet sequence-based frameworks overestimate their risk. See also Figure \ref{['fig:top_10_seq']}.
  • Figure 3: Histogram of the average token scores across the top entity groups on AG News. The "None" type represents words that are not nouns.
  • Figure 4: Boxplots comparing token-level and sequence-level membership scores. More details are provided in Table \ref{['table:ai4privacy_stats']} and Figure \ref{['fig:high_token_ai4privacy_bar']}.
  • Figure 5: Distribution of token InfoRMIA scores on AG News dataset.
  • ...and 3 more figures