Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
TL;DR
The paper demonstrates that large language models can infer a range of private attributes from user text at inference time, well beyond memorization concerns. By building the PersonalReddit dataset and evaluating nine state-of-the-art LLMs, it shows high inference accuracy at a fraction of human cost and time, and even introduces the concept of privacy-invasive chatbots. The study finds anonymization and model alignment currently inadequate as defenses and argues for a broader discussion and stronger privacy-preserving approaches. It contributes formal threat models, a substantial real-data and synthetic-data evaluation, and releases code and synthetic samples to advance research in LLM privacy.
Abstract
Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to $85\%$ top-1 and $95\%$ top-3 accuracy at a fraction of the cost ($100\times$) and time ($240\times$) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.
