SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Yuxia Wang; Jonibek Mansurov; Petar Ivanov; Jinyan Su; Artem Shelmanov; Akim Tsvigun; Osama Mohammed Afzal; Tarek Mahmoud; Giovanni Puccetti; Thomas Arnold; Chenxi Whitehouse; Alham Fikri Aji; Nizar Habash; Iryna Gurevych; Preslav Nakov

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohammed Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Chenxi Whitehouse, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov

TL;DR

The paper presents SemEval-2024 Task 8 on Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection, detailing three subtasks: A (human vs machine classification across monolingual and multilingual tracks), B (multi-way generator attribution), and C (change point boundary detection). It surveys background methods, describes comprehensive datasets spanning multiple languages, domains, and generators, and reports results showing strong performances from LLM-based approaches and ensembles, while Subtask C remains the most challenging. The organizers highlight methodological insights, data augmentation benefits, and the need for robust detection against adversarial attacks, with future work aiming to extend to other modalities and to provide open-source demonstration tools. Overall, the work underscores the rapid progress in machine-generated text detection, demonstrates effective strategies across diverse settings, and outlines practical implications for content integrity in journalism, academia, and law.

Abstract

We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine. This subtask has two tracks: a monolingual track focused solely on English texts and a multilingual track. Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM. Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine. The task attracted a large number of participants: subtask A monolingual (126), subtask A multilingual (59), subtask B (70), and subtask C (30). In this paper, we present the task, analyze the results, and discuss the system submissions and the methods they used. For all subtasks, the best systems used LLMs.

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

TL;DR

Abstract

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Authors

TL;DR

Abstract

Table of Contents