Privacy Leakage Overshadowed by Views of AI: A Study on Human Oversight of Privacy in Language Model Agent
Zhiping Zhang, Bingcan Guo, Tianshi Li
TL;DR
This study investigates how people oversee privacy implications when LM agents act on their behalf in asynchronous interpersonal tasks. Using a task-based online survey (N=300) across six PrivacyLens scenarios, the authors compare user drafts with agent-generated responses and quantify resulting privacy leakage, finding that many participants favor the agent’s leakage-heavy outputs, increasing observed harm from $15.7\%$ to up to $55.0\%$. They identify six distinct privacy-oversight patterns that reflect varying concerns, trust, and privacy preferences, revealing a substantial gap between subjective harms and objective privacy norms. The work highlights the need for privacy-preserving agent design and bidirectional alignment between user preferences and AI behavior, with implications for how to scaffold oversight and calibrate trust in agentic systems.
Abstract
Language model (LM) agents that act on users' behalf for personal tasks (e.g., replying emails) can boost productivity, but are also susceptible to unintended privacy leakage risks. We present the first study on people's capacity to oversee the privacy implications of the LM agents. By conducting a task-based survey ($N=300$), we investigate how people react to and assess the response generated by LM agents for asynchronous interpersonal communication tasks, compared with a response they wrote. We found that people may favor the agent response with more privacy leakage over the response they drafted or consider both good, leading to an increased harmful disclosure from 15.7% to 55.0%. We further identified six privacy behavior patterns reflecting varying concerns, trust levels, and privacy preferences underlying people's oversight of LM agents' actions. Our findings shed light on designing agentic systems that enable privacy-preserving interactions and achieve bidirectional alignment on privacy preferences to help users calibrate trust.
