Table of Contents
Fetching ...

Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs

Ya-Ting Yang, Quanyan Zhu

Abstract

Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Although many prior efforts focus on protecting the privacy of user prompts, relatively few studies consider privacy risks from the enterprise data perspective. Hence, this paper develops a probabilistic framework for analyzing privacy leakage in AI agents based on differential privacy. We model response generation as a stochastic mechanism that maps prompts and datasets to distributions over token sequences. Within this framework, we introduce token-level and message-level differential privacy and derive privacy bounds that relate privacy leakage to generation parameters such as temperature and message length. We further formulate a privacy-utility design problem that characterizes optimal temperature selection.

Differential Privacy in Generative AI Agents: Analysis and Optimal Tradeoffs

Abstract

Large language models (LLMs) and AI agents are increasingly integrated into enterprise systems to access internal databases and generate context-aware responses. While such integration improves productivity and decision support, the model outputs may inadvertently reveal sensitive information. Although many prior efforts focus on protecting the privacy of user prompts, relatively few studies consider privacy risks from the enterprise data perspective. Hence, this paper develops a probabilistic framework for analyzing privacy leakage in AI agents based on differential privacy. We model response generation as a stochastic mechanism that maps prompts and datasets to distributions over token sequences. Within this framework, we introduce token-level and message-level differential privacy and derive privacy bounds that relate privacy leakage to generation parameters such as temperature and message length. We further formulate a privacy-utility design problem that characterizes optimal temperature selection.
Paper Structure (20 sections, 5 theorems, 13 equations, 2 figures)

This paper contains 20 sections, 5 theorems, 13 equations, 2 figures.

Key Result

Lemma 1

If the token generation mechanism at each step $k$ satisfies $(\varepsilon_k,\delta_k)$-DP in Definition def:token_DP, then the induced message-generation mechanism $\mathcal{M}_i : (p_i^t,D_i^t,I_i^t) \to \mathcal{X}_i$ satisfies $(\varepsilon,\delta)$-DP in Definition def:message_DP at the message

Figures (2)

  • Figure 1: Privacy leakage under different temperature. Lines show the means while shaded areas indicate the standard deviations.
  • Figure 2: The quantities in the proposed privacy–utility framework under different temperatures.

Theorems & Definitions (13)

  • Definition 1: Message Space
  • Definition 2: Message Differential Privacy
  • Definition 3: Token Differential Privacy
  • Lemma 1
  • proof
  • Proposition 1: Token-Level Privacy Bound
  • proof
  • Corollary 1: Message-Level Privacy Bound
  • proof
  • Proposition 2: Derivative and Monotonicity
  • ...and 3 more