AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text
Renhua Gu, Xiangfeng Meng
TL;DR
The paper tackles the problem of detecting machine-generated text across multiple generators and domains under SemEval-2024 Task 8 Subtask B. It conducts a systematic study of fine-tuning encoder-only, decoder-only, and encoder-decoder transformers, identifying encoder-only models as the most effective for this task. To address data imbalance and improve robustness, it introduces a class-balanced weighted cross-entropy loss and a soft-voting ensemble that aggregates multiple base models’ predictions. On development data, the approach achieves $99.46\%$ accuracy and ranks first in Subtask B, establishing a new state-of-the-art benchmark with strong generalization across generators and domains.
Abstract
SemEval-2024 Task 8 provides a challenge to detect human-written and machine-generated text. There are 3 subtasks for different detection scenarios. This paper proposes a system that mainly deals with Subtask B. It aims to detect if given full text is written by human or is generated by a specific Large Language Model (LLM), which is actually a multi-class text classification task. Our team AISPACE conducted a systematic study of fine-tuning transformer-based models, including encoderonly, decoder-only and encoder-decoder models. We compared their performance on this task and identified that encoder-only models performed exceptionally well. We also applied a weighted Cross Entropy loss function to address the issue of data imbalance of different class samples. Additionally, we employed softvoting strategy over multi-models ensemble to enhance the reliability of our predictions. Our system ranked top 1 in Subtask B, which sets a state-of-the-art benchmark for this new challenge.
