Table of Contents
Fetching ...

An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs

Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal

TL;DR

This work begins by introducing a novel reconstruction technique that can recover original prompts from hidden states with nearly perfect accuracy across multiple state-of-the-art LLMs, and shows that extensions of the attack are nearly perfectly effective in reversing permuted hidden states of LLMs.

Abstract

Recent advances in Large Language Models (LLMs) have led to the widespread adoption of third-party inference services, raising critical privacy concerns. Existing methods of performing private third-party inference, such as Secure Multiparty Computation (SMPC), often rely on cryptographic methods. However, these methods are thousands of times slower than standard unencrypted inference, and fail to scale to large modern LLMs. Therefore, recent lines of work have explored the replacement of expensive encrypted nonlinear computations in SMPC with statistical obfuscation methods - in particular, revealing permuted hidden states to the third parties, with accompanying strong claims of the difficulty of reversal into the unpermuted states. In this work, we begin by introducing a novel reconstruction technique that can recover original prompts from hidden states with nearly perfect accuracy across multiple state-of-the-art LLMs. We then show that extensions of our attack are nearly perfectly effective in reversing permuted hidden states of LLMs, demonstrating the insecurity of three recently proposed privacy schemes. We further dissect the shortcomings of prior theoretical `proofs' of permuation security which allow our attack to succeed. Our findings highlight the importance of rigorous security analysis in privacy-preserving LLM inference.

An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs

TL;DR

This work begins by introducing a novel reconstruction technique that can recover original prompts from hidden states with nearly perfect accuracy across multiple state-of-the-art LLMs, and shows that extensions of the attack are nearly perfectly effective in reversing permuted hidden states of LLMs.

Abstract

Recent advances in Large Language Models (LLMs) have led to the widespread adoption of third-party inference services, raising critical privacy concerns. Existing methods of performing private third-party inference, such as Secure Multiparty Computation (SMPC), often rely on cryptographic methods. However, these methods are thousands of times slower than standard unencrypted inference, and fail to scale to large modern LLMs. Therefore, recent lines of work have explored the replacement of expensive encrypted nonlinear computations in SMPC with statistical obfuscation methods - in particular, revealing permuted hidden states to the third parties, with accompanying strong claims of the difficulty of reversal into the unpermuted states. In this work, we begin by introducing a novel reconstruction technique that can recover original prompts from hidden states with nearly perfect accuracy across multiple state-of-the-art LLMs. We then show that extensions of our attack are nearly perfectly effective in reversing permuted hidden states of LLMs, demonstrating the insecurity of three recently proposed privacy schemes. We further dissect the shortcomings of prior theoretical `proofs' of permuation security which allow our attack to succeed. Our findings highlight the importance of rigorous security analysis in privacy-preserving LLM inference.

Paper Structure

This paper contains 51 sections, 4 theorems, 12 equations, 1 figure, 13 tables, 4 algorithms.

Key Result

Theorem 1

Let $k>0$. Suppose random weights $\bm{w} \in \mathbb{R}^d$ are drawn from a $d$-variate spherically symmetric distribution $\mathcal{D}$. Then any $\bm{x},\bm{y} \in \mathbb{R}^d$, we have the absolute difference of $\bm{w}$-weighted sums of $\bm{x}$ and $\bm{y}$ exceeds the L1 distance between $\b with probability $\geq P_{\bm{\gamma} \sim \mathcal{D}} (|\gamma_1| \geq k\sqrt{d})$.

Figures (1)

  • Figure 1: High-level representation of our attack to decode user text from LLM hidden states. This attack, and extensions of it, achieve nearly perfect decoding accuracy, even when the hidden states are permuted.

Theorems & Definitions (9)

  • Theorem 1
  • proof
  • Definition 1
  • Theorem 2
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof