Table of Contents
Fetching ...

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Krishna Kanth Nakka, Ahmed Frikha, Ricardo Mendes, Xue Jiang, Xuebing Zhou

TL;DR

The paper investigates PII leakage risk in LLMs under black-box access, focusing on phone-number extraction from Enron-derived data. It compares naive adversarial prompts, manual templates, and a grounding technique called PII-Compass, which prepends prompts with in-domain true prefixes to steer extraction. Grounding yields large improvements, achieving up to 6.86% extraction with thousands of queries and averaging around 3.3% at 128 queries, vastly outperforming baselines that stagnate below 1%. Embedding-space analyses explain the gains by showing that grounded prompts move closer to the true-prefix region in the prompt-embedding space, signaling a practical vulnerability in current evaluation methods. The work highlights the need for robust adversary modeling and careful prompt design when assessing PII leakage risks in LLMs.

Abstract

The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no consensus on the optimal methodology to evaluate this risk, resulting in underestimating realistic adversaries. In this work, we empirically demonstrate that it is possible to improve the extractability of PII by over ten-fold by grounding the prefix of the manually constructed extraction prompt with in-domain data. Our approach, PII-Compass, achieves phone number extraction rates of 0.92%, 3.9%, and 6.86% with 1, 128, and 2308 queries, respectively, i.e., the phone number of 1 person in 15 is extractable.

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

TL;DR

The paper investigates PII leakage risk in LLMs under black-box access, focusing on phone-number extraction from Enron-derived data. It compares naive adversarial prompts, manual templates, and a grounding technique called PII-Compass, which prepends prompts with in-domain true prefixes to steer extraction. Grounding yields large improvements, achieving up to 6.86% extraction with thousands of queries and averaging around 3.3% at 128 queries, vastly outperforming baselines that stagnate below 1%. Embedding-space analyses explain the gains by showing that grounded prompts move closer to the true-prefix region in the prompt-embedding space, signaling a practical vulnerability in current evaluation methods. The work highlights the need for robust adversary modeling and careful prompt design when assessing PII leakage risks in LLMs.

Abstract

The latest and most impactful advances in large models stem from their increased size. Unfortunately, this translates into an improved memorization capacity, raising data privacy concerns. Specifically, it has been shown that models can output personal identifiable information (PII) contained in their training data. However, reported PIII extraction performance varies widely, and there is no consensus on the optimal methodology to evaluate this risk, resulting in underestimating realistic adversaries. In this work, we empirically demonstrate that it is possible to improve the extractability of PII by over ten-fold by grounding the prefix of the manually constructed extraction prompt with in-domain data. Our approach, PII-Compass, achieves phone number extraction rates of 0.92%, 3.9%, and 6.86% with 1, 128, and 2308 queries, respectively, i.e., the phone number of 1 person in 15 is extractable.
Paper Structure (11 sections, 13 figures)

This paper contains 11 sections, 13 figures.

Figures (13)

  • Figure 1: Demonstration example of our proposed PII-Compass method. We extend manual template T6 with the true prefix of a different data subject, Jeff Shorter. Note that the ground truth phone number of "Jeff Shorter" is "214-875-9632" and does not overlap with Eric Gillaspie's number.
  • Figure 2: PII Extraction with True-Prefix Prompts. We vary the length of true-prefix tokens and observe that the extraction rates improve as the number of tokens in the prefix increases.
  • Figure 3: Prompt Sentence Embeddings. We visualize the prompt embeddings of 100 evaluation set data subjects with UMAP mcinnes2018umap. Manually crafted prompt templates T4 (blue) and T6 (purple) lie away from the true-prefix embeddings. However, by prepending the template T6 with a true-prefix of a different data subject in the adversary dataset (red), we observe a significant shift towards the region of true-prefix embeddings (green). In contrast, prepending with a different subdomain string results in embeddings that stay away from true-prefix embeddings (yellow). See Appendix \ref{['sec:prompt_demonstrations']} for the exact prefixes.
  • Figure 4: PII Extraction with Prefix Grounding. We prepend the manual templates with 128 different prefixes, with the best-performing prefix (green bars) achieving extraction rates 5-18 times higher than baseline without grounding (purple bars). Additionally, the rate of extraction at least once in 128 queries averages above 3% (yellow bars). See Figure \ref{['fig:optimal_prefix_context100']} in the Appendix for the best-performing prefixes for each template.
  • Figure 5: Average PII extraction rate and respective range over 11 randomized runs with varying numbers of queries. For further details about experimental setup, refer to Appendix \ref{['sec:ablate_query_counts']}.
  • ...and 8 more figures