Provable Differentially Private Computation of the Cross-Attention Mechanism
Yekun Ke, Yingyu Liang, Zhenmei Shi, Zhao Song, Jiahao Zhang
TL;DR
The paper tackles privacy breaches in cross-attention for large generative models by proving differential privacy guarantees for the cross-attention mechanism. It recasts cross-attention as a weighted distance problem and builds a family of DP data structures (DPTree) that implement private Softmax queries via polynomial kernel approximations, achieving memory \tilde{O}(n d r^2), initialization time \tilde{O}(n d r^2), and per-token query time \tilde{O}(d r^2), while ensuring robustness to adaptive queries with explicit additive and relative error bounds. The approach yields a provable-DP mechanism for cross-attention, the first of its kind, with explicit handling for adaptive queries and extensions to Softmax and high-dimensional settings through DPTreeSoftmax and DPTreeHighDim. This work provides a principled privacy foundation for prompts, RAG data, and other external-context data in LGMs, enabling privacy-preserving deployment in system prompts, retrieval pipelines, and diffusion-based applications.
Abstract
Cross-attention has emerged as a cornerstone module in modern artificial intelligence, underpinning critical applications such as retrieval-augmented generation (RAG), system prompting, and guided stable diffusion. However, this is a rising concern about securing the privacy of cross-attention, as the underlying key and value matrices frequently encode sensitive data or private user information. In this work, we introduce a novel data structure designed to enforce differential privacy (DP) for cross-attention mechanisms, accompanied by provable theoretical guarantees. Specifically, letting $n$ denote the input sequence length, $d$ the feature dimension, $R$ the maximum magnitude of query and key matrices, $R_w$ the maximum magnitude of the value matrix, and $r, s, ε_s$ the parameters for polynomial kernel methods, our proposed structure achieves $\widetilde{O}(ndr^2)$ space and initialization complexity, with a query time of $\widetilde{O}(d r^2)$ per token. Moreover, we demonstrate that our mechanism satisfies $(ε, δ)$-DP, incurring an additive error of $\widetilde{O}((1-ε_s)^{-1} n^{-1} ε^{-1} R^{2s} R_w r^2)$ and a relative error of $2ε_s/(1-ε_s)$ with respect to the ground truth. Crucially, our framework maintains robustness against adaptive queries, ensuring security even in adversarial settings. To the best of our knowledge, this constitutes the first approach providing provable differential privacy for cross-attention, establishing a foundation for future privacy-preserving algorithms in large generative models (LGMs).
