Iterative Multilingual Spectral Attribute Erasure
Shun Shao, Yftah Ziser, Zheng Zhao, Yifu Qiu, Shay B. Cohen, Anna Korhonen
TL;DR
IMSAE introduces an iterative, SVD-based framework to identify and erase joint demographic bias subspaces across multiple languages, enabling debiasing with or without target-language data. By partitioning source languages into subsets and applying progressive cross-covariance erasure, IMSAE robustly reduces bias while preserving downstream task utility across BERT, LLaMA, and Mistral models. The authors validate IMSAE on eight languages and five demographic attributes using the new MSEFair benchmark, demonstrating improvements over monolingual and cross-lingual baselines in both standard and zero-shot settings. The work provides a principled mechanism for multilingual fairness and contributes a valuable dataset for evaluating cross-lingual debiasing in diverse linguistic contexts.
Abstract
Multilingual representations embed words with similar meanings to share a common semantic space across languages, creating opportunities to transfer debiasing effects between languages. However, existing methods for debiasing are unable to exploit this opportunity because they operate on individual languages. We present Iterative Multilingual Spectral Attribute Erasure (IMSAE), which identifies and mitigates joint bias subspaces across multiple languages through iterative SVD-based truncation. Evaluating IMSAE across eight languages and five demographic dimensions, we demonstrate its effectiveness in both standard and zero-shot settings, where target language data is unavailable, but linguistically similar languages can be used for debiasing. Our comprehensive experiments across diverse language models (BERT, LLaMA, Mistral) show that IMSAE outperforms traditional monolingual and cross-lingual approaches while maintaining model utility.
