AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4

Alexander Shirnin; Nikita Andreev; Vladislav Mikhailov; Ekaterina Artemova

AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4

Alexander Shirnin, Nikita Andreev, Vladislav Mikhailov, Ekaterina Artemova

TL;DR

AIpom addresses boundary detection between human-written and AI-generated text in the M4 corpus by a two-stage decoder–encoder pipeline. The decoder initially predicts machine-generated segments, which are refined by two encoders whose outputs are averaged, yielding a final change-point estimate. On SemEval-2024 Subtask C, AIpom achieves the second-highest official MAE of $MAE$ 15.94, with ablations showing the advantage of pipelining and the impact of domain shift on robustness. The approach demonstrates the value of combining instruction-tuned decoders and token-labeling encoders to improve boundary localization in multilingual, multi-domain AI-generated text detection, and it provides publicly released code and models for reproducibility.

Abstract

This paper describes AIpom, a system designed to detect a boundary between human-written and machine-generated text (SemEval-2024 Task 8, Subtask C: Human-Machine Mixed Text Detection). We propose a two-stage pipeline combining predictions from an instruction-tuned decoder-only model and encoder-only sequence taggers. AIpom is ranked second on the leaderboard while achieving a Mean Absolute Error of 15.94. Ablation studies confirm the benefits of pipelining encoder and decoder models, particularly in terms of improved performance.

AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4

TL;DR

15.94, with ablations showing the advantage of pipelining and the impact of domain shift on robustness. The approach demonstrates the value of combining instruction-tuned decoders and token-labeling encoders to improve boundary localization in multilingual, multi-domain AI-generated text detection, and it provides publicly released code and models for reproducibility.

Abstract

Paper Structure (19 sections, 3 figures, 2 tables)

This paper contains 19 sections, 3 figures, 2 tables.

Introduction
Background
Task Formulation
Performance Metric
AIpom
Overview
Decoder
Encoder
Experiments
Overview
Decoder fine-tuning and inference
Data labeling with decoder
Encoder fine-tuning
Hardware specification
Results
...and 4 more sections

Figures (3)

Figure 1: The AIpom pipeline involves fine-tuning decoder and encoder models to predict change points between the human-written and machine-generated text. This process includes fine-tuning the decoder, predicting change points, fine-tuning two encoders, and aggregating predicted change points. stands for fine-tuning a language model, -- predicting with the language model, -- for aggregating the predictions by averaging.
Figure 2: We fine-tune the decoder to output only the machine-written text.
Figure 3: We fine-tune the encoder for token labeling. Human-written tokens are assigned zeros, while machine-generated tokens are assigned ones.

AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4

TL;DR

Abstract

AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4

Authors

TL;DR

Abstract

Table of Contents

Figures (3)