Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric
Yan Shvartzshnaider, Vasisht Duddu
TL;DR
This work introduces privacy bias as a Contextual Integrity–based auditing metric for LLMs and defines privacy bias delta as the deviation from normative expectations. It tackles prompt-sensitivity by proposing a multi-prompt assessment framework to stabilize measurements and demonstrates how LLM capacity and optimizations (e.g., DPO, AWQ) influence biases using IoT and ConfAIde datasets. The paper provides empirical methods and results for identifying biases without explicit expected values and for evaluating delta when norms exist, offering normative guidance for model training, deployment, and policy evaluation. It concludes with discussion of provenance, generalizability, and future directions for normative CI-based privacy auditing of LLMs.
Abstract
As large language models (LLMs) are integrated into sociotechnical systems, it is crucial to examine the privacy biases they exhibit. We define privacy bias as the appropriateness value of information flows in responses from LLMs. A deviation between privacy biases and expected values, referred to as privacy bias delta, may indicate privacy violations. As an auditing metric, privacy bias can help (a) model trainers evaluate the ethical and societal impact of LLMs, (b) service providers select context-appropriate LLMs, and (c) policymakers assess the appropriateness of privacy biases in deployed LLMs. We formulate and answer a novel research question: how can we reliably examine privacy biases in LLMs and the factors that influence them? We present a novel approach for assessing privacy biases using a contextual integrity-based methodology to evaluate the responses from various LLMs. Our approach accounts for the sensitivity of responses across prompt variations, which hinders the evaluation of privacy biases. Finally, we investigate how privacy biases are affected by model capacities and optimizations.
