Table of Contents
Fetching ...

Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

Minseok Choi, Kyunghyun Min, Jaegul Choo

TL;DR

This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance, setting a new standard for secure and adaptable multilingual language models.

Abstract

Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns. Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative. Previous research has focused on machine unlearning for monolingual models, but we find that unlearning in one language does not necessarily transfer to others. This vulnerability makes models susceptible to low-resource language attacks, where sensitive information remains accessible in less dominant languages. This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance. Specifically, our method employs an adaptive unlearning scheme that assigns language-dependent weights to address different language performances of multilingual language models. Empirical results demonstrate the effectiveness of our framework compared to existing unlearning baselines, setting a new standard for secure and adaptable multilingual language models.

Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models

TL;DR

This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance, setting a new standard for secure and adaptable multilingual language models.

Abstract

Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns. Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative. Previous research has focused on machine unlearning for monolingual models, but we find that unlearning in one language does not necessarily transfer to others. This vulnerability makes models susceptible to low-resource language attacks, where sensitive information remains accessible in less dominant languages. This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance. Specifically, our method employs an adaptive unlearning scheme that assigns language-dependent weights to address different language performances of multilingual language models. Empirical results demonstrate the effectiveness of our framework compared to existing unlearning baselines, setting a new standard for secure and adaptable multilingual language models.
Paper Structure (31 sections, 6 equations, 5 figures, 14 tables)

This paper contains 31 sections, 6 equations, 5 figures, 14 tables.

Figures (5)

  • Figure 1: Language models may have memorized the copyrighted data The Little Prince in multiple languages. Consequently, removing such information in just one language does not entirely eradicate it from the model. This underscores the necessity for a multilingual unlearning approach to ensure the information is thoroughly eliminated from the model.
  • Figure 2: Memorization accuracy (MA) of the multilingual model BLOOM across various languages after unlearning with English data only. The plot illustrates that MA does not significantly drop across other languages, highlighting the necessity for a multilingual unlearning approach to effectively reduce memorization across all languages.
  • Figure 3: Comparison of the forget set and test set performance of BLOOM-3B after unlearning on FLORES-200 for en, high-src, and low-src across different $\kappa$ values. Our adaptive unlearning scheme yields the lowest MA on the forget set and maintains a competitive MA on the test set, highlighting the superiority of the approach.
  • Figure 4: Zero-shot performance comparison between the original model and our LingTea framework across five multilingual language understanding tasks. The results demonstrate that LingTea retains world knowledge on par with the original model, ensuring the safety and efficacy of our unlearning approach.
  • Figure 5: Performance of BLOOM-3B after unlearning token sequences in FLORES-200, shown by scaling the number of samples to be forgotten. The first row illustrates results for unlearning samples at once (Batch Unlearning), while the second row depicts results for unlearning samples sequentially (Sequential Unlearning).