BeamClean: Language Aware Embedding Reconstruction
Kaan Kale, Kyle Mylonakis, Jay Roberts, Sidhartha Roy
TL;DR
BeamClean addresses the privacy risk of obfuscated input embeddings in model-as-a-service by formulating a blind inversion attack that observes $y_{1:T}$ and uses the embedding table $\mathcal{X}$ without access to the obfuscation or model. It jointly estimates the noise model parameters $\theta$ and decodes the original token sequence via a beam-search procedure that consumes a language-model prior $p_{LM}$, solving $\hat{\theta}=\arg\max_{\theta}\log p(\theta|y_{1:T})$ and retrieving $\hat{w}_{1:T}=\arg\max_{w_{1:T}}\log\pi_{\hat{\theta}}(y_{1:T}|x(w_{1:T}))\,p_{LM}(w_{1:T})$. Empirically, BeamClean outperforms naive nearest-neighbor baselines under Gaussian and Laplacian noise, achieving substantial token-recovery and PII leakage improvements (e.g., $74.3\%$ vs $42.1\%$ token recovery on MRPC with Gaussian noise at $\epsilon=15$ and $60\%$ PII recovery on PAPILLON at $\epsilon=8.5$), and remains effective even when the decoding prior differs from the target LM. These results underscore the vulnerability of input-independent noise mechanisms and motivate stronger, input-aware privacy defenses for embedding-based inferences in MaaS environments.
Abstract
In this work, we consider an inversion attack on the obfuscated input embeddings sent to a language model on a server, where the adversary has no access to the language model or the obfuscation mechanism and sees only the obfuscated embeddings along with the model's embedding table. We propose BeamClean, an inversion attack that jointly estimates the noise parameters and decodes token sequences by integrating a language-model prior. Against Laplacian and Gaussian obfuscation mechanisms, BeamClean always surpasses naive distance-based attacks. This work highlights the necessity for and robustness of more advanced learned, input-dependent methods.
