Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries
Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, Shou-De Lin
TL;DR
The paper analyzes privacy risks of text embeddings when attackers lack direct access to the embedding model and cannot query it. It proposes a transferable embedding inversion attack that trains a surrogate encoder using a leaked dataset and employs consistency regularization and adversarial training to mimic the victim encoder, enabling text reconstruction through a surrogate. Experiments across OpenAI Ada, SBERT, ST5, and a clinical MIMIC-III dataset show that the transfer attack substantially outperforms direct inversion, recovering sensitive attributes with high accuracy. The work highlights significant privacy risks in embedding pipelines and vector databases, underscoring the need for robust defenses and data governance in embedding-based systems.
Abstract
This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker to infer sensitive information from text embeddings without direct access. Our experiments across various embedding models and a clinical dataset demonstrate that our transfer attack significantly outperforms traditional methods, revealing the potential privacy vulnerabilities in embedding technologies and emphasizing the need for enhanced security measures.
