From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis
TL;DR
The paper tackles multilingual toxicity mitigation in language models, addressing the gap that prior work was English-centric. It compares retrieval-based Goodtriever and decoding-time DExperts across translations and continual learning, using translated CivilComments and HolisticBias data with EMT as the primary metric. Key findings show that translated data often yields substantial toxicity reductions, with Goodtriever generally outperforming DExperts, and that language-order and data parallelism significantly affect cross-lingual mitigation. The work establishes a practical framework and benchmark for multilingual toxicity mitigation and releases code and data to foster future research.
Abstract
To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.
