Table of Contents
Fetching ...

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis

TL;DR

The paper tackles multilingual toxicity mitigation in language models, addressing the gap that prior work was English-centric. It compares retrieval-based Goodtriever and decoding-time DExperts across translations and continual learning, using translated CivilComments and HolisticBias data with EMT as the primary metric. Key findings show that translated data often yields substantial toxicity reductions, with Goodtriever generally outperforming DExperts, and that language-order and data parallelism significantly affect cross-lingual mitigation. The work establishes a practical framework and benchmark for multilingual toxicity mitigation and releases code and data to foster future research.

Abstract

To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

TL;DR

The paper tackles multilingual toxicity mitigation in language models, addressing the gap that prior work was English-centric. It compares retrieval-based Goodtriever and decoding-time DExperts across translations and continual learning, using translated CivilComments and HolisticBias data with EMT as the primary metric. Key findings show that translated data often yields substantial toxicity reductions, with Goodtriever generally outperforming DExperts, and that language-order and data parallelism significantly affect cross-lingual mitigation. The work establishes a practical framework and benchmark for multilingual toxicity mitigation and releases code and data to foster future research.

Abstract

To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.
Paper Structure (26 sections, 2 equations, 12 figures, 7 tables)

This paper contains 26 sections, 2 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Overview of the experimental axes we cover. In our work, we delve into the axes of modeling framework choices, evaluation of toxicity, and dataset characteristics.
  • Figure 2: Multilingual toxicity mitigation with in-language datasets of high-resource languages. Lower EMT is better. On top of each bar is the relative EMT decrease when compared to the baseline.
  • Figure 3: In-language English samples are translated to each target language and then backtranslated to English. In the direction of English $\rightarrow$ Target (red), toxicity scores are mostly reduced for all languages, except Portuguese. In the direction of English $\rightarrow$ Target $\rightarrow$ English (violet), scores are reduced even further, except for Russian.
  • Figure 4: Comparing overall EMT results for high-resource languages: Translated data shows greater effectiveness in reducing toxicity than in-language datasets for English, Russian, Italian, French, Portuguese, and Spanish.
  • Figure 5: EMT ($\downarrow$) for the base model, Goodtriever, and DExperts. They are evaluated with both mid and high-resource languages in the training data.
  • ...and 7 more figures