Towards Red Teaming in Multimodal and Multilingual Translation
Christophe Ropers, David Dale, Prangthip Hansanti, Gabriel Mejia Gonzalez, Ivan Evtimov, Corinne Wong, Christophe Touret, Kristina Pereyra, Seohyun Sonia Kim, Cristian Canton Ferrer, Pierre Andrews, Marta R. Costa-jussà
TL;DR
This paper pioneers human-based red-teaming for multimodal and multilingual translation, presenting a methodology to elicit critical errors beyond standard MT evaluations and defining a comprehensive error taxonomy. It applies the approach to SeamlessM4T v2 and SeamlessExpressive, revealing toxicity as the most pervasive category across modalities and languages, with notable effects from speech prosody, colloquial language, and accent/pitch. The authors also explore automated proxies (BLASER and COMET) to scale red-teaming, showing partial correlations with general translation quality and limited capacity to single out truly critical errors, suggesting a hybrid human–machine workflow. They provide practical recommendations for user notices, mitigation strategies (e.g., MuTox, MinTox), and a pathway toward open-sourcing a red-teaming benchmark, while acknowledging scalability and comparability limitations inherent to human-driven drills.
Abstract
Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reliability of models. One such method is the red teaming approach, which aims to generate edge cases where a model will produce critical errors. While this methodology is becoming standard practice for generative AI, its application to the realm of conditional AI remains largely unexplored. This paper presents the first study on human-based red teaming for Machine Translation (MT), marking a significant step towards understanding and improving the performance of translation models. We delve into both human-based red teaming and a study on automation, reporting lessons learned and providing recommendations for both translation models and red teaming drills. This pioneering work opens up new avenues for research and development in the field of MT.
