Table of Contents
Fetching ...

Detecting Training Data of Large Language Models via Expectation Maximization

Gyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, Miguel Ballesteros, William Yang Wang

TL;DR

This work tackles the challenging problem of detecting training data used by large language models through membership inference under realistic, distribution-shifted conditions. It introduces EM-MIA, an Expectation-Maximization framework that iteratively refines membership and prefix scores, leveraging their mutual reinforcement to surpass prior methods on WikiMIA. To evaluate robustness across scenarios, the authors propose OLMoMIA, a benchmark built from Open LMO resources with controllable overlaps between member and non-member distributions. Empirical results show EM-MIA achieves state-of-the-art performance on WikiMIA and robust performance on OLMoMIA, though all methods struggle when member and non-member distributions are nearly identical, highlighting fundamental limitations in current MIAs for LLMs.

Abstract

The advancement of large language models has grown parallel to the opacity of their training data. Membership inference attacks (MIAs) aim to determine whether specific data was used to train a model. They offer valuable insights into detecting data contamination and ensuring compliance with privacy and copyright standards. However, MIA for LLMs is challenging due to the massive scale of training data and the inherent ambiguity of membership in texts. Moreover, creating realistic MIA evaluation benchmarks is difficult as training and test data distributions are often unknown. We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm. Our approach leverages the observation that these scores can improve each other: membership scores help identify effective prefixes for detecting training data, while prefix scores help determine membership. As a result, EM-MIA achieves state-of-the-art results on WikiMIA. To enable comprehensive evaluation, we introduce OLMoMIA, a benchmark built from OLMo resources, which allows controlling task difficulty through varying degrees of overlap between training and test data distributions. Our experiments demonstrate EM-MIA is robust across different scenarios while also revealing fundamental limitations of current MIA approaches when member and non-member distributions are nearly identical.

Detecting Training Data of Large Language Models via Expectation Maximization

TL;DR

This work tackles the challenging problem of detecting training data used by large language models through membership inference under realistic, distribution-shifted conditions. It introduces EM-MIA, an Expectation-Maximization framework that iteratively refines membership and prefix scores, leveraging their mutual reinforcement to surpass prior methods on WikiMIA. To evaluate robustness across scenarios, the authors propose OLMoMIA, a benchmark built from Open LMO resources with controllable overlaps between member and non-member distributions. Empirical results show EM-MIA achieves state-of-the-art performance on WikiMIA and robust performance on OLMoMIA, though all methods struggle when member and non-member distributions are nearly identical, highlighting fundamental limitations in current MIAs for LLMs.

Abstract

The advancement of large language models has grown parallel to the opacity of their training data. Membership inference attacks (MIAs) aim to determine whether specific data was used to train a model. They offer valuable insights into detecting data contamination and ensuring compliance with privacy and copyright standards. However, MIA for LLMs is challenging due to the massive scale of training data and the inherent ambiguity of membership in texts. Moreover, creating realistic MIA evaluation benchmarks is difficult as training and test data distributions are often unknown. We introduce EM-MIA, a novel membership inference method that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm. Our approach leverages the observation that these scores can improve each other: membership scores help identify effective prefixes for detecting training data, while prefix scores help determine membership. As a result, EM-MIA achieves state-of-the-art results on WikiMIA. To enable comprehensive evaluation, we introduce OLMoMIA, a benchmark built from OLMo resources, which allows controlling task difficulty through varying degrees of overlap between training and test data distributions. Our experiments demonstrate EM-MIA is robust across different scenarios while also revealing fundamental limitations of current MIA approaches when member and non-member distributions are nearly identical.

Paper Structure

This paper contains 27 sections, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Histogram of prefix scores for members and non-members measured by AUC-ROC in the Oracle setting on the WikiMIA dataset shi2023detecting with a length of 128 and Pythia-6.9B biderman2023pythia.
  • Figure 2: The basic setup of OLMoMIA benchmark. The horizontal line indicates a training step. For any intermediate checkpoint at a specific step, we can consider training data before and after that step as members and non-members, respectively.
  • Figure 3: ROC curves of MIA when using the negative prefix score with varying metrics as a membership score in the Oracle setting on the WikiMIA dataset shi2023detecting with a length of 128 and Pythia-6.9B biderman2023pythia.
  • Figure 4: Performance of EM-MIA for each iteration with varying baselines for initialization and scoring functions $S$ on the WikiMIA dataset with a length of 128 and Pythia-6.9B model.
  • Figure : EM-MIA