Table of Contents
Fetching ...

Detection of Fake Generated Scientific Abstracts

Panagiotis C. Theocharopoulos, Panagiotis Anagnostou, Anastasia Tsoukala, Spiros V. Georgakopoulos, Sotiris K. Tasoulis, Vassilis P. Plagianakos

TL;DR

This work addresses the challenge of distinguishing AI-generated scientific abstracts from human-written ones by constructing a benchmark dataset from the CORD-19 corpus using GPT-3 to generate abstracts from titles. It systematically evaluates multiple text representations—TF-IDF, NER-based features, Word2Vec embeddings, and contextualized embeddings from BERT—across classical ML and deep-learning classifiers. The strongest result comes from an LSTM with Word2Vec, achieving an accuracy of $98.7\%$ and an AUC of $0.987$, with Word2Vec embeddings outperforming BERT in this domain. The study also analyzes misclassifications, noting that title complexity and vocabulary differences influence detection, and suggests expanding the dataset with updated models to enhance generalization across domains and languages, with ethical considerations in mind.

Abstract

The widespread adoption of Large Language Models and publicly available ChatGPT has marked a significant turning point in the integration of Artificial Intelligence into people's everyday lives. The academic community has taken notice of these technological advancements and has expressed concerns regarding the difficulty of discriminating between what is real and what is artificially generated. Thus, researchers have been working on developing effective systems to identify machine-generated text. In this study, we utilize the GPT-3 model to generate scientific paper abstracts through Artificial Intelligence and explore various text representation methods when combined with Machine Learning models with the aim of identifying machine-written text. We analyze the models' performance and address several research questions that rise during the analysis of the results. By conducting this research, we shed light on the capabilities and limitations of Artificial Intelligence generated text.

Detection of Fake Generated Scientific Abstracts

TL;DR

This work addresses the challenge of distinguishing AI-generated scientific abstracts from human-written ones by constructing a benchmark dataset from the CORD-19 corpus using GPT-3 to generate abstracts from titles. It systematically evaluates multiple text representations—TF-IDF, NER-based features, Word2Vec embeddings, and contextualized embeddings from BERT—across classical ML and deep-learning classifiers. The strongest result comes from an LSTM with Word2Vec, achieving an accuracy of and an AUC of , with Word2Vec embeddings outperforming BERT in this domain. The study also analyzes misclassifications, noting that title complexity and vocabulary differences influence detection, and suggests expanding the dataset with updated models to enhance generalization across domains and languages, with ethical considerations in mind.

Abstract

The widespread adoption of Large Language Models and publicly available ChatGPT has marked a significant turning point in the integration of Artificial Intelligence into people's everyday lives. The academic community has taken notice of these technological advancements and has expressed concerns regarding the difficulty of discriminating between what is real and what is artificially generated. Thus, researchers have been working on developing effective systems to identify machine-generated text. In this study, we utilize the GPT-3 model to generate scientific paper abstracts through Artificial Intelligence and explore various text representation methods when combined with Machine Learning models with the aim of identifying machine-written text. We analyze the models' performance and address several research questions that rise during the analysis of the results. By conducting this research, we shed light on the capabilities and limitations of Artificial Intelligence generated text.
Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Schematic overview of the study. Dataset Generation: From the CORD-19 data has been collected the titles and the abstracts of the academic literature. The titles of the selected work have been prompted to GPT-3 model, via its API for the AI-generated abstract based on its title (Left). Data Analysis: The study involved text cleaning and data representations using various methods, as well as the models' results evaluation (Right).
  • Figure 2: Most frequent word appearances in both the human-created texts (left) and the AI-generated texts (right).
  • Figure 3: Most frequent word appearances in all the titles (left) and in the titles of the misclassified AI-generated titles (right).
  • Figure 4: Most frequent word appearances in all the AI-generated abstracts (left) and in the misclassified AI-generated abstracts (right).
  • Figure 5: Most frequent word appearances in all the human-created abstracts (left) and in the misclassified human-created abstracts (right).