Embedding Inversion via Conditional Masked Diffusion Language Models
Han Xiao
TL;DR
This paper introduces embedding inversion via conditional masked diffusion, reframing the task as parallel denoising of a fully masked sequence conditioned on the embedding vector. A compact $78\mathrm{M}$-parameter decoder using adaptive layer normalization (AdaLN) enables encoder-agnostic inversion and removes the need to access the target encoder during inference, achieving up to 81.3% token accuracy and 0.87 cosine similarity on 32-token sequences across three encoders. The method outperforms autoregressive correction in terms of efficiency and maintains competitive quality through various decoding strategies, notably Euler-based remasking and two-stage decoding. These results imply that embeddings induce protective considerations similar to text data, motivating enhanced defenses for embedding leakage in retrieval systems and cross-model inversion scenarios.
Abstract
We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves 81.3% token accuracy and 0.87 cosine similarity.
