SmurfCat at PAN 2024 TextDetox: Alignment of Multilingual Transformers for Text Detoxification
Elisei Rykov, Konstantin Zaytsev, Ivan Anisimov, Alexandr Voronin
TL;DR
The paper tackles multilingual text detoxification across nine languages by augmenting training data through English-to-target translations and rigorous filtering to preserve meaning and control toxicity. It fine-tunes multilingual seq2seq transformers (notably mT0-XL and Aya-101) and applies diverse beam search alongside ORPO alignment to refine outputs, achieving top automatic results and strong human performance, especially for Ukrainian, with a 3.7B-parameter model. The approach demonstrates effective data augmentation and alignment strategies to boost detoxification in low-resource languages and highlights directions for cross-language transfer and interpretability. Overall, the work offers a practical, scalable pipeline for multilingual detoxification with competitive results in both automated and human evaluations.
Abstract
This paper presents a solution for the Multilingual Text Detoxification task in the PAN-2024 competition of the SmurfCat team. Using data augmentation through machine translation and a special filtering procedure, we collected an additional multilingual parallel dataset for text detoxification. Using the obtained data, we fine-tuned several multilingual sequence-to-sequence models, such as mT0 and Aya, on a text detoxification task. We applied the ORPO alignment technique to the final model. Our final model has only 3.7 billion parameters and achieves state-of-the-art results for the Ukrainian language and near state-of-the-art results for other languages. In the competition, our team achieved first place in the automated evaluation with a score of 0.52 and second place in the final human evaluation with a score of 0.74.
