Table of Contents
Fetching ...

Provable Differentially Private Computation of the Cross-Attention Mechanism

Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, Jiahao Zhang

TL;DR

The paper tackles privacy breaches in cross-attention for large generative models by proving differential privacy guarantees for the cross-attention mechanism. It recasts cross-attention as a weighted distance problem and builds a family of DP data structures (DPTree) that implement private Softmax queries via polynomial kernel approximations, achieving memory \tilde{O}(n d r^2), initialization time \tilde{O}(n d r^2), and per-token query time \tilde{O}(d r^2), while ensuring robustness to adaptive queries with explicit additive and relative error bounds. The approach yields a provable-DP mechanism for cross-attention, the first of its kind, with explicit handling for adaptive queries and extensions to Softmax and high-dimensional settings through DPTreeSoftmax and DPTreeHighDim. This work provides a principled privacy foundation for prompts, RAG data, and other external-context data in LGMs, enabling privacy-preserving deployment in system prompts, retrieval pipelines, and diffusion-based applications.

Abstract

Cross-attention has emerged as a cornerstone module in modern artificial intelligence, underpinning critical applications such as retrieval-augmented generation (RAG), system prompting, and guided stable diffusion. However, this is a rising concern about securing the privacy of cross-attention, as the underlying key and value matrices frequently encode sensitive data or private user information. In this work, we introduce a novel data structure designed to enforce differential privacy (DP) for cross-attention mechanisms, accompanied by provable theoretical guarantees. Specifically, letting $n$ denote the input sequence length, $d$ the feature dimension, $R$ the maximum magnitude of query and key matrices, $R_w$ the maximum magnitude of the value matrix, and $r, s, ε_s$ the parameters for polynomial kernel methods, our proposed structure achieves $\widetilde{O}(ndr^2)$ space and initialization complexity, with a query time of $\widetilde{O}(d r^2)$ per token. Moreover, we demonstrate that our mechanism satisfies $(ε, δ)$-DP, incurring an additive error of $\widetilde{O}((1-ε_s)^{-1} n^{-1} ε^{-1} R^{2s} R_w r^2)$ and a relative error of $2ε_s/(1-ε_s)$ with respect to the ground truth. Crucially, our framework maintains robustness against adaptive queries, ensuring security even in adversarial settings. To the best of our knowledge, this constitutes the first approach providing provable differential privacy for cross-attention, establishing a foundation for future privacy-preserving algorithms in large generative models (LGMs).

Provable Differentially Private Computation of the Cross-Attention Mechanism

TL;DR

The paper tackles privacy breaches in cross-attention for large generative models by proving differential privacy guarantees for the cross-attention mechanism. It recasts cross-attention as a weighted distance problem and builds a family of DP data structures (DPTree) that implement private Softmax queries via polynomial kernel approximations, achieving memory \tilde{O}(n d r^2), initialization time \tilde{O}(n d r^2), and per-token query time \tilde{O}(d r^2), while ensuring robustness to adaptive queries with explicit additive and relative error bounds. The approach yields a provable-DP mechanism for cross-attention, the first of its kind, with explicit handling for adaptive queries and extensions to Softmax and high-dimensional settings through DPTreeSoftmax and DPTreeHighDim. This work provides a principled privacy foundation for prompts, RAG data, and other external-context data in LGMs, enabling privacy-preserving deployment in system prompts, retrieval pipelines, and diffusion-based applications.

Abstract

Cross-attention has emerged as a cornerstone module in modern artificial intelligence, underpinning critical applications such as retrieval-augmented generation (RAG), system prompting, and guided stable diffusion. However, this is a rising concern about securing the privacy of cross-attention, as the underlying key and value matrices frequently encode sensitive data or private user information. In this work, we introduce a novel data structure designed to enforce differential privacy (DP) for cross-attention mechanisms, accompanied by provable theoretical guarantees. Specifically, letting denote the input sequence length, the feature dimension, the maximum magnitude of query and key matrices, the maximum magnitude of the value matrix, and the parameters for polynomial kernel methods, our proposed structure achieves space and initialization complexity, with a query time of per token. Moreover, we demonstrate that our mechanism satisfies -DP, incurring an additive error of and a relative error of with respect to the ground truth. Crucially, our framework maintains robustness against adaptive queries, ensuring security even in adversarial settings. To the best of our knowledge, this constitutes the first approach providing provable differential privacy for cross-attention, establishing a foundation for future privacy-preserving algorithms in large generative models (LGMs).
Paper Structure (43 sections, 33 theorems, 73 equations, 1 figure, 7 algorithms)

This paper contains 43 sections, 33 theorems, 73 equations, 1 figure, 7 algorithms.

Key Result

Theorem 1.2

Let $Q,K,V, \mathrm{Attn}$ be defined in Definition def:cross. Let $p_f$ be the probability of failure parameter. Let $r,s,\epsilon_s$ be the parameters of the polynomial kernel methods (Lemma lem:exp_inner_prod:formal). Then, our Algorithm alg:DP_cross_attn requires $\widetilde{O}(ndr^2)$ total mem

Figures (1)

  • Figure 1: The visualization of how to compute the weighted $\ell_1$ distance for rounded dataset $X \in [0,1]^{10}$. The number above each $x_i$ is $w_i$. See Algorithm \ref{['alg:preprocessing_one_d']} for details. Suppose $y=0$. Then $\sum_{i = 1}^n w_i |y - x_i| = 0.1 \cdot 2.2 + 0.3 \cdot 3.1 + 0.3 \cdot (-2) + 0.3 \cdot (-3) + 0.4 \cdot 2 + 0.6 \cdot 6 + 0.7 \cdot 0.5 + 0.9 \cdot (-1) + 0.9 \cdot 1 = 4.4$. See more details in Lemma \ref{['lem:weighted_l1']}.

Theorems & Definitions (70)

  • Definition 1.1: Softmax cross-attention, vsp+17
  • Theorem 1.2: Main result; Informal version of Theorem \ref{['thm:cross_attention']}
  • Definition 3.1: Neighboring dataset
  • Definition 3.2: Sensitivity
  • Definition 3.3: $(\epsilon, \delta)$-DP
  • Definition 3.4: Truncated Laplace distribution, gdgk20
  • Lemma 3.6: Laplace mechanism, dr14gdgk20, see Lemma 2.2 in aimn23
  • Theorem 4.1: Softmax cross-attention, informal version of Theorem \ref{['thm:cross_attention:formal']}
  • Remark 4.2
  • Definition 5.1: Weighted Softmax query (without normalization)
  • ...and 60 more