Table of Contents
Fetching ...

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Lingyi Yang, Feng Jiang, Haizhou Li

TL;DR

The paper develops a robust detection framework for texts influenced by ChatGPT by introducing the HPPT dataset of human written abstracts paired with ChatGPT polished versions and the Polish Ratio metric to quantify ChatGPT involvement. A Roberta-based detector trained on HPPT demonstrates strong in-domain and cross-domain performance, outperforming existing baselines on HPPT, HC3, and CDB, with notable resilience to polishing attacks. The Polish Ratio provides an interpretable explanation of the degree of modification, addressing the need for interpretable AI in detection tasks, and shows reliable separation among human written, ChatGPT polished, and ChatGPT generated texts across languages and models. The approach also includes GLTR-based analysis, highlighting limitations for polished texts and showcasing the PR regression as a practical, evidence-backed explanation tool. Overall, the work offers a scalable, cross-domain, and multilingual solution with actionable interpretability for detecting and understanding ChatGPT involvement in texts.

Abstract

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text. It provides a mechanism to measure the degree of ChatGPT influence in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement.

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

TL;DR

The paper develops a robust detection framework for texts influenced by ChatGPT by introducing the HPPT dataset of human written abstracts paired with ChatGPT polished versions and the Polish Ratio metric to quantify ChatGPT involvement. A Roberta-based detector trained on HPPT demonstrates strong in-domain and cross-domain performance, outperforming existing baselines on HPPT, HC3, and CDB, with notable resilience to polishing attacks. The Polish Ratio provides an interpretable explanation of the degree of modification, addressing the need for interpretable AI in detection tasks, and shows reliable separation among human written, ChatGPT polished, and ChatGPT generated texts across languages and models. The approach also includes GLTR-based analysis, highlighting limitations for polished texts and showcasing the PR regression as a practical, evidence-backed explanation tool. Overall, the work offers a scalable, cross-domain, and multilingual solution with actionable interpretability for detecting and understanding ChatGPT involvement in texts.

Abstract

The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have impressed readers and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of the degree of modification made by ChatGPT compared to the original human-written text. It provides a mechanism to measure the degree of ChatGPT influence in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement.
Paper Structure (18 sections, 3 equations, 7 figures, 5 tables)

This paper contains 18 sections, 3 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The study design of our detection method.
  • Figure 2: Similarity distribution of polished abstracts in our HPPT dataset.
  • Figure 3: The visualization result of some sample texts with the help of the GLTR demo: http://gltr.io./dist/index.html. A word that ranks within the top 10 probability is highlighted in green, top 100 in yellow, top 1,000 in red, and the rest in purple. Sample 1 and Sample 2 are chosen from the HC3 test set, while Sample 3 and Sample 4 are chosen from the HPPT test set.
  • Figure 4: Differences between predicted PR for human-written texts (HW), ChatGPT-polished texts (CP) and ChatGPT-generated texts (CG): HW and CP are directly from the HPPT testset where CG are from the HC3 testset.
  • Figure 5: Differences between human-written samples and misclassified samples whose ground truth is human-written in the test set: the mean value of PR for misclassified samples is around 0.3, which makes our detection model confused.
  • ...and 2 more figures