Qwen it detect machine-generated text?
Teodor-George Marchitan, Claudiu Creanga, Liviu P. Dinu
TL;DR
This work tackles binary multilingual machine-generated text detection for Coling 2025 Task 1 by comparing causal models (last-layer training) and masked models (LoRA-fine-tuned) across monolingual and multilingual tracks. The Qwen2.5-0.5B-based causal approach, with data balancing and constrained token length, achieves the top F1 Micro score in the monolingual track (0.8333) and near-top F1 Macro (0.8301), while masked, LoRA-tuned XLM-Roberta-Base provides a strong alternative. The multilingual results lag behind the monolingual ones, highlighting cross-language generalization challenges, with error analysis revealing strong performance on some unseen sources (e.g., ChatGPT-related data) but weaknesses on others (e.g., Mixset). Overall, the paper demonstrates effective architecture choices for subtask A and outlines concrete directions (language-specific fine-tuning, data augmentation, and latent feature exploitation) to mitigate overfitting and improve multilingual robustness in future work.
Abstract
This paper describes the approach of the Unibuc - NLP team in tackling the Coling 2025 GenAI Workshop, Task 1: Binary Multilingual Machine-Generated Text Detection. We explored both masked language models and causal models. For Subtask A, our best model achieved first-place out of 36 teams when looking at F1 Micro (Auxiliary Score) of 0.8333, and second-place when looking at F1 Macro (Main Score) of 0.8301
