Table of Contents
Fetching ...

ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

Sandra Mitrović, Davide Andreoletti, Omran Ayoub

TL;DR

The paper addresses detecting short ChatGPT-generated text and explaining the model's decisions. It compares a Transformer-based classifier (DistilBERT) with a perplexity baseline and uses SHAP to reveal discriminative features, finding high accuracy for custom-query text and reduced performance for rephrased text. The study highlights ChatGPT's characteristic polite, impersonal style and discusses the implications for security and digital forensics, while noting limitations and avenues for future work. Overall, the approach demonstrates the value of explainable, transformer-based methods for AI-authorship detection in short texts and raises awareness of potential misuse through text rephrasing.

Abstract

ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.

ChatGPT or Human? Detect and Explain. Explaining Decisions of Machine Learning Model for Detecting Short ChatGPT-generated Text

TL;DR

The paper addresses detecting short ChatGPT-generated text and explaining the model's decisions. It compares a Transformer-based classifier (DistilBERT) with a perplexity baseline and uses SHAP to reveal discriminative features, finding high accuracy for custom-query text and reduced performance for rephrased text. The study highlights ChatGPT's characteristic polite, impersonal style and discusses the implications for security and digital forensics, while noting limitations and avenues for future work. Overall, the approach demonstrates the value of explainable, transformer-based methods for AI-authorship detection in short texts and raises awareness of potential misuse through text rephrasing.

Abstract

ChatGPT has the ability to generate grammatically flawless and seemingly-human replies to different types of questions from various domains. The number of its users and of its applications is growing at an unprecedented rate. Unfortunately, use and abuse come hand in hand. In this paper, we study whether a machine learning model can be effectively trained to accurately distinguish between original human and seemingly human (that is, ChatGPT-generated) text, especially when this text is short. Furthermore, we employ an explainable artificial intelligence framework to gain insight into the reasoning behind the model trained to differentiate between ChatGPT-generated and human-generated text. The goal is to analyze model's decisions and determine if any specific patterns or characteristics can be identified. Our study focuses on short online reviews, conducting two experiments comparing human-generated and ChatGPT-generated text. The first experiment involves ChatGPT text generated from custom queries, while the second experiment involves text generated by rephrasing original human-generated reviews. We fine-tune a Transformer-based model and use it to make predictions, which are then explained using SHAP. We compare our model with a perplexity score-based approach and find that disambiguation between human and ChatGPT-generated reviews is more challenging for the ML model when using rephrased text. However, our proposed approach still achieves an accuracy of 79%. Using explainability, we observe that ChatGPT's writing is polite, without specific details, using fancy and atypical vocabulary, impersonal, and typically it does not express feelings.
Paper Structure (11 sections, 8 figures, 2 tables)

This paper contains 11 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Schematic representation of the study design and building blocks.
  • Figure 2: Distribution of the length of the text samples (in words) of each of the three data sets considered in our work: (a) Human, (b) $ChatGPT_{query}$ and $ChatGPT_{rephrase}$.
  • Figure 3: Box Plots of the perplexity values of the text samples of the three datasets considered in our analysis.
  • Figure 4: SHAP local explanation plots for three decisions corresponding to three different text samples (data points): (a) "We had 7 at our table and the service was pretty fast.", (b) "The cashier was friendly and even brought the food out to me.", (c) "We also ordered the spinach and avocado salad, the ingredients were sad and the dressing literally had zero taste."
  • Figure 5: SHAP local explanation plots for three decisions corresponding to three different text samples (data points): (a) "The vegetables are so fresh and the sauce feels like authentic Thai.", (b) "I left with a stomach ache and felt sick for the rest of the day.", (c) "I hate those things as much as cheap quality black olives."
  • ...and 3 more figures