AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4
Alexander Shirnin, Nikita Andreev, Vladislav Mikhailov, Ekaterina Artemova
TL;DR
AIpom addresses boundary detection between human-written and AI-generated text in the M4 corpus by a two-stage decoder–encoder pipeline. The decoder initially predicts machine-generated segments, which are refined by two encoders whose outputs are averaged, yielding a final change-point estimate. On SemEval-2024 Subtask C, AIpom achieves the second-highest official MAE of $MAE$ 15.94, with ablations showing the advantage of pipelining and the impact of domain shift on robustness. The approach demonstrates the value of combining instruction-tuned decoders and token-labeling encoders to improve boundary localization in multilingual, multi-domain AI-generated text detection, and it provides publicly released code and models for reproducibility.
Abstract
This paper describes AIpom, a system designed to detect a boundary between human-written and machine-generated text (SemEval-2024 Task 8, Subtask C: Human-Machine Mixed Text Detection). We propose a two-stage pipeline combining predictions from an instruction-tuned decoder-only model and encoder-only sequence taggers. AIpom is ranked second on the leaderboard while achieving a Mean Absolute Error of 15.94. Ablation studies confirm the benefits of pipelining encoder and decoder models, particularly in terms of improved performance.
