MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection
Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, Al Nahian Bin Emran, Amrita Ganguly, Ozlem Uzuner
TL;DR
This paper presents MasonTigers' entry to SemEval-2024 Task 8, examining binary, multi-way, and mixed text detection across multilingual and cross-domain settings. It demonstrates that ensembles of discriminator transformers, sentence-transformer features, and statistical learners outperform single models, with zero-shot prompting and FLAN-T5 offering additional but weaker gains for Tracks A and B. Across Subtasks A–C, the team reports strong development performance for ensembles and highlights challenges such as outliers and cross-generator variability that affect generalization. The work emphasizes the practical significance of robust, multilingual detection methods in the face of rapidly advancing machine-generated text, while outlining clear directions for preprocessing, stability, and further investigation of prompt-based approaches.
Abstract
This paper presents the MasonTigers entry to the SemEval-2024 Task 8 - Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection. The task encompasses Binary Human-Written vs. Machine-Generated Text Classification (Track A), Multi-Way Machine-Generated Text Classification (Track B), and Human-Machine Mixed Text Detection (Track C). Our best performing approaches utilize mainly the ensemble of discriminator transformer models along with sentence transformer and statistical machine learning approaches in specific cases. Moreover, zero-shot prompting and fine-tuning of FLAN-T5 are used for Track A and B.
