Table of Contents
Fetching ...

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

Yijia Shao, Tianshi Li, Weiyan Shi, Yanchen Liu, Diyi Yang

TL;DR

PrivacyLens provides an extensible framework to quantify language models' privacy norm awareness in action by grounding norms as contextual seeds, expanding them into vignettes, and simulating agent trajectories within a sandbox. It reveals a notable disconnect between QA probing performance and real-world action, showing nontrivial privacy leakage even with privacy-focused prompts. The dataset (493 seeds with corresponding vignettes and trajectories) and the multi-level evaluation pipeline enable robust red-teaming and worst-case assessment, and the framework is designed to extend to additional datasets and cultural contexts. Overall, PrivacyLens highlights the pressing need to move beyond probing tasks and develop action-based privacy evaluations for LM agents.

Abstract

As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

TL;DR

PrivacyLens provides an extensible framework to quantify language models' privacy norm awareness in action by grounding norms as contextual seeds, expanding them into vignettes, and simulating agent trajectories within a sandbox. It reveals a notable disconnect between QA probing performance and real-world action, showing nontrivial privacy leakage even with privacy-focused prompts. The dataset (493 seeds with corresponding vignettes and trajectories) and the multi-level evaluation pipeline enable robust red-teaming and worst-case assessment, and the framework is designed to extend to additional datasets and cultural contexts. Overall, PrivacyLens highlights the pressing need to move beyond probing tasks and develop action-based privacy evaluations for LM agents.

Abstract

As language models (LMs) are widely utilized in personalized communication scenarios (e.g., sending emails, writing social media posts) and endowed with a certain level of agency, ensuring they act in accordance with the contextual privacy norms becomes increasingly critical. However, quantifying the privacy norm awareness of LMs and the emerging privacy risk in LM-mediated communication is challenging due to (1) the contextual and long-tailed nature of privacy-sensitive cases, and (2) the lack of evaluation approaches that capture realistic application scenarios. To address these challenges, we propose PrivacyLens, a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories, enabling multi-level evaluation of privacy leakage in LM agents' actions. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. Using this dataset, we reveal a discrepancy between LM performance in answering probing questions and their actual behavior when executing user instructions in an agent setup. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions. We also demonstrate the dynamic nature of PrivacyLens by extending each seed into multiple trajectories to red-team LM privacy leakage risk. Dataset and code are available at https://github.com/SALT-NLP/PrivacyLens.
Paper Structure (97 sections, 1 equation, 8 figures, 5 tables, 1 algorithm)

This paper contains 97 sections, 1 equation, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Risk Model of PrivacyLens. PrivacyLens quantifies an emerging LM privacy risk where LMs unintentionally leak private information when assisting human communication. The risk model involves three primary actors: (1) a sender, who is a daily user instructing an LM to assist in communication; (2) a recipient, who is specified in the user instruction; (3) an LM agent, who gets access to sensitive information through tool use (e.g., reading events from the user's personal calendar). The privacy leakage arises when the LM agent shares a piece of information (e.g., "lunch with TechAdvance Recruiter") in its final action, and the information flow violates a privacy norm.
  • Figure 2: Data construction pipeline in PrivacyLens. PrivacyLens starts with contextual privacy-sensitive seeds (A). It extends each seed into a vignette (B) with more details through template-based generation. The seed and vignette will be used to guide the emulator in sandbox simulation to get an LM agent trajectory (C). We employ the Surgery Kit module to improve the vignette and trajectory quality based on unit tests and LM refinement.
  • Figure 3: An example of the multi-level evaluation of PrivacyLens.
  • Figure 4: Probing accuracy with 95% confidence intervals.
  • Figure 5: Final actions of the GPT-4 agent with "Privacy-Enhancing Prompt". 1 is a case with no information leakage and a helpfulness score of 3; 2 is a case with information leakage despite a helpfulness score of 3; 3 is a case with no information leakage but a low helpfulness score of 1.
  • ...and 3 more figures