Table of Contents
Fetching ...

CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts

Peipeng Yu, Jiahan Chen, Xuan Feng, Zhihua Xia

TL;DR

This work introduces CHEAT, the largest publicly available dataset for detecting ChatGPT-written abstracts, consisting of 35,304 AI-generated and 15,395 human-written abstracts from IEEE Xplore in computer science. It analyzes linguistic differences, including lexical and dependency patterns, and evaluates multiple detection methods, showing detectors perform well on fully generated text but struggle when human involvement is present. The study also employs SHAP-based explainability to understand detector decisions and highlights the increased difficulty of detecting mixed or polished content. Overall, CHEAT provides a valuable benchmark for developing robust, domain-specific detectors and highlights the need to address human-in-the-loop synthesis in academic writing.

Abstract

The powerful ability of ChatGPT has caused widespread concern in the academic community. Malicious users could synthesize dummy academic content through ChatGPT, which is extremely harmful to academic rigor and originality. The need to develop ChatGPT-written content detection algorithms call for large-scale datasets. In this paper, we initially investigate the possible negative impact of ChatGPT on academia,and present a large-scale CHatGPT-writtEn AbsTract dataset (CHEAT) to support the development of detection algorithms. In particular, the ChatGPT-written abstract dataset contains 35,304 synthetic abstracts, with Generation, Polish, and Mix as prominent representatives. Based on these data, we perform a thorough analysis of the existing text synthesis detection algorithms. We show that ChatGPT-written abstracts are detectable, while the detection difficulty increases with human involvement.Our dataset is available in https://github.com/botianzhe/CHEAT.

CHEAT: A Large-scale Dataset for Detecting ChatGPT-writtEn AbsTracts

TL;DR

This work introduces CHEAT, the largest publicly available dataset for detecting ChatGPT-written abstracts, consisting of 35,304 AI-generated and 15,395 human-written abstracts from IEEE Xplore in computer science. It analyzes linguistic differences, including lexical and dependency patterns, and evaluates multiple detection methods, showing detectors perform well on fully generated text but struggle when human involvement is present. The study also employs SHAP-based explainability to understand detector decisions and highlights the increased difficulty of detecting mixed or polished content. Overall, CHEAT provides a valuable benchmark for developing robust, domain-specific detectors and highlights the need to address human-in-the-loop synthesis in academic writing.

Abstract

The powerful ability of ChatGPT has caused widespread concern in the academic community. Malicious users could synthesize dummy academic content through ChatGPT, which is extremely harmful to academic rigor and originality. The need to develop ChatGPT-written content detection algorithms call for large-scale datasets. In this paper, we initially investigate the possible negative impact of ChatGPT on academia,and present a large-scale CHatGPT-writtEn AbsTract dataset (CHEAT) to support the development of detection algorithms. In particular, the ChatGPT-written abstract dataset contains 35,304 synthetic abstracts, with Generation, Polish, and Mix as prominent representatives. Based on these data, we perform a thorough analysis of the existing text synthesis detection algorithms. We show that ChatGPT-written abstracts are detectable, while the detection difficulty increases with human involvement.Our dataset is available in https://github.com/botianzhe/CHEAT.
Paper Structure (14 sections, 6 figures, 3 tables)

This paper contains 14 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The different distributions of human-written and ChatGPT-written abstracts. The visualization results are obtained by GLTR gehrmann2019gltr.
  • Figure 2: The lexical distribution for human-written, ChatGPT-polished, and ChatGPT-generated abstracts.
  • Figure 3: The dependency distribution forhuman-written, ChatGPT-polished, and ChatGPT-generated abstracts.
  • Figure 4: ROC curves of existing detection schemes on three datasets(ChatGPT-Generation, ChatGPT-Polish, and ChatGPT-Mix).
  • Figure 5: Visualization of SHAP value statistics. The top 10 words ranked by contribution are listed for human-written and ChatGPT-written abstracts.
  • ...and 1 more figures