Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Lingyi Yang; Feng Jiang; Haizhou Li

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Lingyi Yang, Feng Jiang, Haizhou Li

TL;DR

The paper develops a robust detection framework for texts influenced by ChatGPT by introducing the HPPT dataset of human written abstracts paired with ChatGPT polished versions and the Polish Ratio metric to quantify ChatGPT involvement. A Roberta-based detector trained on HPPT demonstrates strong in-domain and cross-domain performance, outperforming existing baselines on HPPT, HC3, and CDB, with notable resilience to polishing attacks. The Polish Ratio provides an interpretable explanation of the degree of modification, addressing the need for interpretable AI in detection tasks, and shows reliable separation among human written, ChatGPT polished, and ChatGPT generated texts across languages and models. The approach also includes GLTR-based analysis, highlighting limitations for polished texts and showcasing the PR regression as a practical, evidence-backed explanation tool. Overall, the work offers a scalable, cross-domain, and multilingual solution with actionable interpretability for detecting and understanding ChatGPT involvement in texts.

Abstract

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text. It provides a mechanism to measure the degree of ChatGPT influence in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement.

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

TL;DR

Abstract

Paper Structure (18 sections, 3 equations, 7 figures, 5 tables)

This paper contains 18 sections, 3 equations, 7 figures, 5 tables.

Introduction
Related Work
Method
HPPT Dataset: Human-ChatGPT Polished Paired abstractT
Detection: Roberta-based black box model
Explanation
Experiment and analysis
Experiment Setup
Dataset
Reproduction details
Detection Result
Explanation Analysis
GLTR
Polish Ratio Regression
Case study for Polish Ratio
...and 3 more sections

Figures (7)

Figure 1: The study design of our detection method.
Figure 2: Similarity distribution of polished abstracts in our HPPT dataset.
Figure 3: The visualization result of some sample texts with the help of the GLTR demo: http://gltr.io./dist/index.html. A word that ranks within the top 10 probability is highlighted in green, top 100 in yellow, top 1,000 in red, and the rest in purple. Sample 1 and Sample 2 are chosen from the HC3 test set, while Sample 3 and Sample 4 are chosen from the HPPT test set.
Figure 4: Differences between predicted PR for human-written texts (HW), ChatGPT-polished texts (CP) and ChatGPT-generated texts (CG): HW and CP are directly from the HPPT testset where CG are from the HC3 testset.
Figure 5: Differences between human-written samples and misclassified samples whose ground truth is human-written in the test set: the mean value of PR for misclassified samples is around 0.3, which makes our detection model confused.
...and 2 more figures

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

TL;DR

Abstract

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Authors

TL;DR

Abstract

Table of Contents

Figures (7)