Table of Contents
Fetching ...

Membership Inference Attacks Against In-Context Learning

Rui Wen, Zheng Li, Michael Backes, Yang Zhang

TL;DR

This work presents the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities, and proposes four attack strategies tailored to various constrained scenarios and conducts extensive experiments on four popular large language models.

Abstract

Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.

Membership Inference Attacks Against In-Context Learning

TL;DR

This work presents the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities, and proposes four attack strategies tailored to various constrained scenarios and conducts extensive experiments on four popular large language models.

Abstract

Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that synthesizes the strengths of the aforementioned strategies, achieving an accuracy advantage of over 95\% in most cases. Furthermore, we investigate three potential defenses targeting data, instruction, and output. Results demonstrate combining defenses from orthogonal dimensions significantly reduces privacy leakage and offers enhanced privacy assurances.
Paper Structure (35 sections, 1 equation, 30 figures, 1 table)

This paper contains 35 sections, 1 equation, 30 figures, 1 table.

Figures (30)

  • Figure 1: An illustrative example of In-Context Learning. The language model is initialized by a prompt combined with instruction (pink) and demonstrations (green).
  • Figure 2: The GAP attack involves querying the model with a target sample. The adversary determines the membership status by evaluating the accuracy of the model's prediction: if the prediction is correct, the target sample is classified as a member; otherwise, it is classified as a non-member.
  • Figure 3: Performance of the baseline membership inference attack (GAP), revealing challenges and suboptimal results, particularly evident in larger language models such as GPT-3.5. In this figure, language models are prompted with one example with the instruction presented in \ref{['figure:gap_illus_workflow']}. The performance metric, which indicates the advantage over random guessing, is detailed in \ref{['sec:setup']}.
  • Figure 4: The Inquiry attack determines membership status by directly querying the model. In our work, we use the prompt "Have you seen this sentence before."
  • Figure 5: The Repeat attack initiates a conversation with a few words and asks the model to complete the sentence. The adversary predicts membership status by assessing the semantic similarity between the generated sample and the target sample.
  • ...and 25 more figures