Table of Contents
Fetching ...

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

Dimitri Staufer, Kirsten Morehouse

TL;DR

It is demonstrated empirically that models confidently generate multiple PD categories for well-known individuals, and introduced LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies.

Abstract

Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audit PD across eight LLMs (3 open-source; 5 API-based, including GPT-4o), introduce LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies (N=20), and run two studies with EU residents to capture (i) intuitions about LLM-generated PD (N1=155) and (ii) reactions to tool output (N2=303). We show empirically that models confidently generate multiple PD categories for well-known individuals. For everyday users, GPT-4o generates 11 features with 60% or more accuracy (e.g., gender, hair color, languages). Finally, 72% of participants sought control over model-generated associations with their name, raising questions about what counts as PD and whether data privacy rights should extend to LLMs.

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

TL;DR

It is demonstrated empirically that models confidently generate multiple PD categories for well-known individuals, and introduced LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies.

Abstract

Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audit PD across eight LLMs (3 open-source; 5 API-based, including GPT-4o), introduce LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies (N=20), and run two studies with EU residents to capture (i) intuitions about LLM-generated PD (N1=155) and (ii) reactions to tool output (N2=303). We show empirically that models confidently generate multiple PD categories for well-known individuals. For everyday users, GPT-4o generates 11 features with 60% or more accuracy (e.g., gender, hair color, languages). Finally, 72% of participants sought control over model-generated associations with their name, raising questions about what counts as PD and whether data privacy rights should extend to LLMs.
Paper Structure (87 sections, 9 equations, 22 figures, 8 tables)

This paper contains 87 sections, 9 equations, 22 figures, 8 tables.

Figures (22)

  • Figure 1: Overview of our feature selection process. Starting from all 5824 human-related Wikidata properties, we only use those for which WikiData has entries for at least 100 individuals (the original WikiMem Dataset). We first taxonomize properties into high-level buckets, then filter for properties that are broadly relevant, heterogeneous, and expressible in one to three words, and finally re-categorize the selected properties into eight user-facing feature groups, resulting in the 50 features used in our study.
  • Figure 2: Example of a WikiMem canary template for the property residence (P551). It shows the property label, description, and several paraphrased sentence variants used to test whether an LLM memorizes human-residence associations.
  • Figure 3: Overview of the original WikiMem probing framework. Paraphrased canary sentences are paired with Wikidata ground truths and type-consistent counterfactuals (e.g., “Hogwarts” vs. “Spain” for Harry Potter’s residence). The model’s negative log-likelihood (NLL) scores are calibrated against generic-subject and similar-name baselines, then ranked to decide whether a fact is memorized (boolean) and to compute association strength (numeric).
  • Figure 4: Overview of our adapted WikiMem probing framework for black-box and user-facing applications. Canary paraphrases are now paired with user-provided ground truths, which undergo prefix-based truncation. We generate random two-character counterfactuals and use a standardized system prompt to query black-box APIs. The resulting top completions feed into user-facing outputs (top predictions, association strength, and confidence). Depending on the API, the backend instantiates ranking and strength either from calibrated token log-probabilities (when exposed) or from "votes" of the top completion per probe when log-probabilities are unavailable (see \ref{['sssec:prompting-formatting-metrics']}).
  • Figure 5: Distribution of confidence across models and subject sets. In most models, confidence cleanly separates Famous from Synthetic, indicating the stable retrieval or inference of personal data. The y-axis is truncated for cross-model comparability, omitting part of the Mistral 8B Instruct synthetic distribution.
  • ...and 17 more figures