GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

Yuxia Wang; Artem Shelmanov; Jonibek Mansurov; Akim Tsvigun; Vladislav Mikhailov; Rui Xing; Zhuohan Xie; Jiahui Geng; Giovanni Puccetti; Ekaterina Artemova; Jinyan Su; Minh Ngoc Ta; Mervat Abassy; Kareem Ashraf Elozeiri; Saad El Dine Ahmed El Etter; Maiya Goloburda; Tarek Mahmoud; Raj Vardhan Tomar; Nurkhan Laiyk; Osama Mohammed Afzal; Ryuto Koike; Masahiro Kaneko; Alham Fikri Aji; Nizar Habash; Iryna Gurevych; Preslav Nakov

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

Yuxia Wang, Artem Shelmanov, Jonibek Mansurov, Akim Tsvigun, Vladislav Mikhailov, Rui Xing, Zhuohan Xie, Jiahui Geng, Giovanni Puccetti, Ekaterina Artemova, Jinyan Su, Minh Ngoc Ta, Mervat Abassy, Kareem Ashraf Elozeiri, Saad El Dine Ahmed El Etter, Maiya Goloburda, Tarek Mahmoud, Raj Vardhan Tomar, Nurkhan Laiyk, Osama Mohammed Afzal, Ryuto Koike, Masahiro Kaneko, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

TL;DR

This paper presents GenAI Content Detection Task 1 at COLING 2025, a shared task focused on binary machine-generated text detection in English and multilingual settings. It introduces a robust data framework drawn from HC3, M4GT-Bench, MAGE, RAID, OUTFOX, and LLM-DetectAIve, with dev-test and test sets spanning eight domains and 15 languages, including unseen ones. Baseline detectors (RoBERTa and XLM-R) establish reference performance, while 36 English and 27 multilingual submissions reveal strong in-domain capabilities yet notable generalization gaps to out-of-domain data and unseen languages; prompt design effects further illuminate detector vulnerabilities. The results highlight the value of domain- and language-aware methods, ensemble strategies, and language identification for robust multilingual detection, pointing to future directions in data diversity, prompt design, and domain adaptation. Overall, the study provides a comprehensive benchmark and actionable insights for developing generalized detectors for GenAI content across languages and domains.

Abstract

We present the GenAI Content Detection Task~1 -- a shared task on binary machine generated text detection, conducted as a part of the GenAI workshop at COLING 2025. The task consists of two subtasks: Monolingual (English) and Multilingual. The shared task attracted many participants: 36 teams made official submissions to the Monolingual subtask during the test phase and 26 teams -- to the Multilingual. We provide a comprehensive overview of the data, a summary of the results -- including system rankings and performance scores -- detailed descriptions of the participating systems, and an in-depth analysis of submissions. https://github.com/mbzuai-nlp/COLING-2025-Workshop-on-MGT-Detection-Task1

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

TL;DR

Abstract

GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human

Authors

TL;DR

Abstract

Table of Contents