LAGO: Few-shot Crosslingual Embedding Inversion Attacks via Language Similarity-Aware Graph Optimization
Wenrui Yu, Yiyi Chen, Johannes Bjerva, Sokol Kosta, Qiongxiu Li
TL;DR
This paper addresses the privacy risk of embedding inversion in multilingual NLP by introducing LAGO, a language similarity-aware graph optimization framework for few-shot cross-lingual inversion. It builds a topological graph over languages and enforces cross-language consistency through two optimization variants: hard linear inequality constraints and soft total variation penalties, with ALGEN recovered as a special case. Empirical results across multiple languages and victim models show that leveraging language similarity improves transferability by about 10–20% in Rouge-L scores, especially in extremely low-data regimes, and demonstrate robustness to the choice of similarity metric. The work highlights the need for privacy defenses that account for linguistic structure in multilingual embeddings and discusses differential privacy as a defense, noting a significant utility cost in cross-lingual settings.
Abstract
We propose LAGO - Language Similarity-Aware Graph Optimization - a novel approach for few-shot cross-lingual embedding inversion attacks, addressing critical privacy vulnerabilities in multilingual NLP systems. Unlike prior work in embedding inversion attacks that treat languages independently, LAGO explicitly models linguistic relationships through a graph-based constrained distributed optimization framework. By integrating syntactic and lexical similarity as edge constraints, our method enables collaborative parameter learning across related languages. Theoretically, we show this formulation generalizes prior approaches, such as ALGEN, which emerges as a special case when similarity constraints are relaxed. Our framework uniquely combines Frobenius-norm regularization with linear inequality or total variation constraints, ensuring robust alignment of cross-lingual embedding spaces even with extremely limited data (as few as 10 samples per language). Extensive experiments across multiple languages and embedding models demonstrate that LAGO substantially improves the transferability of attacks with 10-20% increase in Rouge-L score over baselines. This work establishes language similarity as a critical factor in inversion attack transferability, urging renewed focus on language-aware privacy-preserving multilingual embeddings.
