MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

Wenjie Fu; Huandong Wang; Chen Gao; Guanghua Liu; Yong Li; Tao Jiang

MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

TL;DR

The paper tackles privacy and copyright risks from LLM pre-training data by reframing detection as a membership inference attack and introducing MIA-Tuner, which instructs LLMs to detect their own training data via a dedicated soft prompt. It provides an up-to-date benchmark, WIKIMIA-24, and two defense pipelines to counter both existing detectors and the proposed method. Empirical results show MIA-Tuner achieves state-of-the-art detection with an average AUC around 0.97 across aligned and unaligned models and demonstrates few-shot viability. The work also proposes defense mechanisms that substantially reduce detection success with minimal impact on model utility, highlighting practical implications for publishing and fine-tuning LLMs in privacy-sensitive settings.

Abstract

The increasing parameters and expansive dataset of large language models (LLMs) highlight the urgent demand for a technical solution to audit the underlying privacy risks and copyright issues associated with LLMs. Existing studies have partially addressed this need through an exploration of the pre-training data detection problem, which is an instance of a membership inference attack (MIA). This problem involves determining whether a given piece of text has been used during the pre-training phase of the target LLM. Although existing methods have designed various sophisticated MIA score functions to achieve considerable detection performance in pre-trained LLMs, how to achieve high-confidence detection and how to perform MIA on aligned LLMs remain challenging. In this paper, we propose MIA-Tuner, a novel instruction-based MIA method, which instructs LLMs themselves to serve as a more precise pre-training data detector internally, rather than design an external MIA score function. Furthermore, we design two instruction-based safeguards to respectively mitigate the privacy risks brought by the existing methods and MIA-Tuner. To comprehensively evaluate the most recent state-of-the-art LLMs, we collect a more up-to-date MIA benchmark dataset, named WIKIMIA-24, to replace the widely adopted benchmark WIKIMIA. We conduct extensive experiments across various aligned and unaligned LLMs over the two benchmark datasets. The results demonstrate that MIA-Tuner increases the AUC of MIAs from 0.7 to a significantly high level of 0.9.

MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

TL;DR

Abstract

Paper Structure (29 sections, 12 equations, 7 figures, 5 tables)

This paper contains 29 sections, 12 equations, 7 figures, 5 tables.

Introduction
Related Works
Preliminary
Large Language Models (LLMs)
Problem Statement and Threat Model
Methodology
Motivation & Intuition
Tuning LLMs to Conduct Detection
Hybrid Loss for Aligned LLMs
Contrastive Loss for Unaligned LLMs
Tuning LLMs to Defend Detection
Experiments
Experimental Setup
Benchmark Datasets Construction
Target Models and Baselines
...and 14 more sections

Figures (7)

Figure 1: The overall framework of MIA-Tuner and the two pipelines designed for aligned and unaligned LLMs, resprectively.
Figure 2: The performance of MIA-Tuner on LLaMA-2 while utilizing different numbers of fine-tuning samples.
Figure 3: The detection performance of all baselines on LLaMA-2 w/ and w/o the proposed safeguard.
Figure 4: The accuracy of LLaMA-2 on the MMLU benchmark w/ and w/o the proposed safeguard across four different types of tasks.
Figure 5: The fine-tuning (FT) PPL of (a) the benign user and the detection AUC of (b) the malicious user across aligned and unaligned LLMs in three stages: Before FT, After FT (w/o Defender), After FT (w/ Defender).
...and 2 more figures

MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

TL;DR

Abstract

MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

Authors

TL;DR

Abstract

Table of Contents

Figures (7)