Table of Contents
Fetching ...

Embedding Inversion via Conditional Masked Diffusion Language Models

Han Xiao

TL;DR

This paper introduces embedding inversion via conditional masked diffusion, reframing the task as parallel denoising of a fully masked sequence conditioned on the embedding vector. A compact $78\mathrm{M}$-parameter decoder using adaptive layer normalization (AdaLN) enables encoder-agnostic inversion and removes the need to access the target encoder during inference, achieving up to 81.3% token accuracy and 0.87 cosine similarity on 32-token sequences across three encoders. The method outperforms autoregressive correction in terms of efficiency and maintains competitive quality through various decoding strategies, notably Euler-based remasking and two-stage decoding. These results imply that embeddings induce protective considerations similar to text data, motivating enhanced defenses for embedding leakage in retrieval systems and cross-model inversion scenarios.

Abstract

We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves 81.3% token accuracy and 0.87 cosine similarity.

Embedding Inversion via Conditional Masked Diffusion Language Models

TL;DR

This paper introduces embedding inversion via conditional masked diffusion, reframing the task as parallel denoising of a fully masked sequence conditioned on the embedding vector. A compact -parameter decoder using adaptive layer normalization (AdaLN) enables encoder-agnostic inversion and removes the need to access the target encoder during inference, achieving up to 81.3% token accuracy and 0.87 cosine similarity on 32-token sequences across three encoders. The method outperforms autoregressive correction in terms of efficiency and maintains competitive quality through various decoding strategies, notably Euler-based remasking and two-stage decoding. These results imply that embeddings induce protective considerations similar to text data, motivating enhanced defenses for embedding leakage in retrieval systems and cross-model inversion scenarios.

Abstract

We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes through a 78M parameter model with no access to the target encoder. On 32-token sequences across three embedding models, the method achieves 81.3% token accuracy and 0.87 cosine similarity.
Paper Structure (17 sections, 7 equations, 2 figures, 9 tables)

This paper contains 17 sections, 7 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Architecture of the Conditional Masked Diffusion Language Model. The embedding vector is projected and injected into each transformer layer via AdaLN conditioning. The model predicts original tokens at masked positions through iterative denoising.
  • Figure 2: Training dynamics across three embedding encoders on 2M multilingual samples. Qwen3-Embedding reaches 81.3% token accuracy at 72.5K steps with validation loss 1.32. All models show diminishing returns beyond 50K steps, suggesting architectural improvements rather than extended training as the path to further gains.