Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Robin Staab; Mark Vero; Mislav Balunović; Martin Vechev

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev

TL;DR

The paper demonstrates that large language models can infer a range of private attributes from user text at inference time, well beyond memorization concerns. By building the PersonalReddit dataset and evaluating nine state-of-the-art LLMs, it shows high inference accuracy at a fraction of human cost and time, and even introduces the concept of privacy-invasive chatbots. The study finds anonymization and model alignment currently inadequate as defenses and argues for a broader discussion and stronger privacy-preserving approaches. It contributes formal threat models, a substantial real-data and synthetic-data evaluation, and releases code and synthetic samples to advance research in LLM privacy.

Abstract

Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to $85\%$ top-1 and $95\%$ top-3 accuracy at a fraction of the cost ($100\times$) and time ($240\times$) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

TL;DR

Abstract

top-1 and

top-3 accuracy at a fraction of the cost (

) and time (

) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.

Paper Structure (40 sections, 1 equation, 21 figures, 4 tables)

This paper contains 40 sections, 1 equation, 21 figures, 4 tables.

Introduction
This Work: Privacy Violations through LLM Inference
Emerging Frontiers
Potential Mitigations
Main contributions
Responsible Disclosure
Related Work
Threat Models
Free Text Inference
Adversarial Interaction
A Dataset for LLM-Based Author Profiling
Evaluation of Privacy Violating LLM Inferences
Evaluation of Current Mitigations
Conclusion
Dataset Statistics
...and 25 more sections

Figures (21)

Figure 1: Adversarial inference of personal attributes from text. We assume the adversary has access to a dataset of user-written texts (e.g., by scraping an online forum). Given a text, the adversary creates a model prompt using a fixed adversarial template 1. They then leverage a pre-trained LLM in 2 to automatically infer personal user attributes3, a task that previously required humans. current models are able to pick up on subtle clues in text and language (\ref{['sec:evaluation']}), providing accurate inferences on real data. Finally, in 4, the model uses its inference to output a formatted user profile.
Figure 2: Free text inference: The adversary creates a prompt from user texts, using an LLM do infer personal attributes.
Figure 3: Illustration of the adversarial interaction. The user is unaware of $T_h$ given by the adversary. The model steers the conversation in each round to refine prior information.
Figure 4: Accuracies of 9 state-of-the-art LLMs on the PersonalReddit dataset. GPT-4 achieves the highest total top-1 accuracy of $85.5\%$. Note that Human had additional information.
Figure 5: Accuracies [%] for each hardness level for one representative model of each family. We observe a clear decrease in accuracy with increasing hardness scores.
...and 16 more figures

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

TL;DR

Abstract

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (21)