Table of Contents
Fetching ...

Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric

Yan Shvartzshnaider, Vasisht Duddu

TL;DR

This work introduces privacy bias as a Contextual Integrity–based auditing metric for LLMs and defines privacy bias delta as the deviation from normative expectations. It tackles prompt-sensitivity by proposing a multi-prompt assessment framework to stabilize measurements and demonstrates how LLM capacity and optimizations (e.g., DPO, AWQ) influence biases using IoT and ConfAIde datasets. The paper provides empirical methods and results for identifying biases without explicit expected values and for evaluating delta when norms exist, offering normative guidance for model training, deployment, and policy evaluation. It concludes with discussion of provenance, generalizability, and future directions for normative CI-based privacy auditing of LLMs.

Abstract

As large language models (LLMs) are integrated into sociotechnical systems, it is crucial to examine the privacy biases they exhibit. We define privacy bias as the appropriateness value of information flows in responses from LLMs. A deviation between privacy biases and expected values, referred to as privacy bias delta, may indicate privacy violations. As an auditing metric, privacy bias can help (a) model trainers evaluate the ethical and societal impact of LLMs, (b) service providers select context-appropriate LLMs, and (c) policymakers assess the appropriateness of privacy biases in deployed LLMs. We formulate and answer a novel research question: how can we reliably examine privacy biases in LLMs and the factors that influence them? We present a novel approach for assessing privacy biases using a contextual integrity-based methodology to evaluate the responses from various LLMs. Our approach accounts for the sensitivity of responses across prompt variations, which hinders the evaluation of privacy biases. Finally, we investigate how privacy biases are affected by model capacities and optimizations.

Privacy Bias in Language Models: A Contextual Integrity-based Auditing Metric

TL;DR

This work introduces privacy bias as a Contextual Integrity–based auditing metric for LLMs and defines privacy bias delta as the deviation from normative expectations. It tackles prompt-sensitivity by proposing a multi-prompt assessment framework to stabilize measurements and demonstrates how LLM capacity and optimizations (e.g., DPO, AWQ) influence biases using IoT and ConfAIde datasets. The paper provides empirical methods and results for identifying biases without explicit expected values and for evaluating delta when norms exist, offering normative guidance for model training, deployment, and policy evaluation. It concludes with discussion of provenance, generalizability, and future directions for normative CI-based privacy auditing of LLMs.

Abstract

As large language models (LLMs) are integrated into sociotechnical systems, it is crucial to examine the privacy biases they exhibit. We define privacy bias as the appropriateness value of information flows in responses from LLMs. A deviation between privacy biases and expected values, referred to as privacy bias delta, may indicate privacy violations. As an auditing metric, privacy bias can help (a) model trainers evaluate the ethical and societal impact of LLMs, (b) service providers select context-appropriate LLMs, and (c) policymakers assess the appropriateness of privacy biases in deployed LLMs. We formulate and answer a novel research question: how can we reliably examine privacy biases in LLMs and the factors that influence them? We present a novel approach for assessing privacy biases using a contextual integrity-based methodology to evaluate the responses from various LLMs. Our approach accounts for the sensitivity of responses across prompt variations, which hinders the evaluation of privacy biases. Finally, we investigate how privacy biases are affected by model capacities and optimizations.
Paper Structure (23 sections, 16 figures, 7 tables)

This paper contains 23 sections, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Relation between Privacy Bias and Variance across Paraphrased Prompts: Red inner circle indicates the expected value, while each ✗ represents the LLM's response on the appropriateness of an information flow across paraphrased prompts. Low variance allows to measure privacy bias reliably (A, B, C), whereas high variance makes it challenging (D). In A, B, and C, knowing the expected values allows computing privacy bias delta, but we can analyze the privacy biases without them.
  • Figure 2: Impact of Temperature Parameter: Temperature value of zero (in blue) has the least variance across different LLMs. This allows for reducing variance and reliably measuring privacy biases.
  • Figure 3: Distribution of Responses: Responses across LLMs and prompt variations before filtering with thresholds.
  • Figure 4: Prompt Sensitivity with Paraphrasing. Paraphrasing prompts results in significant variation in LLM responses, suggesting that LLMs suffer from prompt sensitivity. All three paraphrasers have similar variance across all LLMs.
  • Figure 5: Prompt Sensitivity by Re-Ordering Likert Scale. LLMs show significant variance due to prompt variation, with three random Likert scale orders per prompt.
  • ...and 11 more figures