Table of Contents
Fetching ...

TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text

Xiaoyan Qu, Xiangfeng Meng

TL;DR

The paper tackles token-level boundary detection in human-machine mixed text, aiming to locate where human-authored content ends and LLM-generated content begins. It reframes the problem as token-level classification and demonstrates strong performance using ensembles of XLNet-based models and long-range LLMs, achieving a single-model MAE of 2.44 and a 2.22 MAE with ensemble. Through systematic ablations, the authors show that adding top-layer LSTM/BiLSTM modules, employing segment-aware loss functions, and applying pretraining on related tasks substantially improves boundary-detection accuracy. The work contributes a state-of-the-art benchmark for boundary detection on a SemEval-2024 dataset, along with practical guidance on model design and training strategies to mitigate issues arising from mixed human-machine text.

Abstract

With the increasing prevalence of text generated by large language models (LLMs), there is a growing concern about distinguishing between LLM-generated and human-written texts in order to prevent the misuse of LLMs, such as the dissemination of misleading information and academic dishonesty. Previous research has primarily focused on classifying text as either entirely human-written or LLM-generated, neglecting the detection of mixed texts that contain both types of content. This paper explores LLMs' ability to identify boundaries in human-written and machine-generated mixed texts. We approach this task by transforming it into a token classification problem and regard the label turning point as the boundary. Notably, our ensemble model of LLMs achieved first place in the 'Human-Machine Mixed Text Detection' sub-task of the SemEval'24 Competition Task 8. Additionally, we investigate factors that influence the capability of LLMs in detecting boundaries within mixed texts, including the incorporation of extra layers on top of LLMs, combination of segmentation loss, and the impact of pretraining. Our findings aim to provide valuable insights for future research in this area.

TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text

TL;DR

The paper tackles token-level boundary detection in human-machine mixed text, aiming to locate where human-authored content ends and LLM-generated content begins. It reframes the problem as token-level classification and demonstrates strong performance using ensembles of XLNet-based models and long-range LLMs, achieving a single-model MAE of 2.44 and a 2.22 MAE with ensemble. Through systematic ablations, the authors show that adding top-layer LSTM/BiLSTM modules, employing segment-aware loss functions, and applying pretraining on related tasks substantially improves boundary-detection accuracy. The work contributes a state-of-the-art benchmark for boundary detection on a SemEval-2024 dataset, along with practical guidance on model design and training strategies to mitigate issues arising from mixed human-machine text.

Abstract

With the increasing prevalence of text generated by large language models (LLMs), there is a growing concern about distinguishing between LLM-generated and human-written texts in order to prevent the misuse of LLMs, such as the dissemination of misleading information and academic dishonesty. Previous research has primarily focused on classifying text as either entirely human-written or LLM-generated, neglecting the detection of mixed texts that contain both types of content. This paper explores LLMs' ability to identify boundaries in human-written and machine-generated mixed texts. We approach this task by transforming it into a token classification problem and regard the label turning point as the boundary. Notably, our ensemble model of LLMs achieved first place in the 'Human-Machine Mixed Text Detection' sub-task of the SemEval'24 Competition Task 8. Additionally, we investigate factors that influence the capability of LLMs in detecting boundaries within mixed texts, including the incorporation of extra layers on top of LLMs, combination of segmentation loss, and the impact of pretraining. Our findings aim to provide valuable insights for future research in this area.
Paper Structure (21 sections, 1 figure, 7 tables)