Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

Shushanta Pudasaini; Luis Miralles-Pechuán; David Lillis; Marisa Llorens Salvador

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, Marisa Llorens Salvador

TL;DR

AI-generated content from large language models threatens academic integrity by enabling easy completion of work. The paper surveys plagiarism and AIGC detection methods, datasets, tools, and evasion strategies, highlighting the evolving landscape post-LLMs. It discusses limitations such as the lack of standardized benchmarks and the ease with which detectors can be bypassed, and advocates non-technical educational strategies to complement technical solutions. The findings emphasize the need for standardized benchmarks, multi-model detection systems, and explainability to support policy and practical implementations in academia.

Abstract

The rise of Large Language Models (LLMs) such as ChatGPT and Gemini has posed new challenges for the academic community. With the help of these models, students can easily complete their assignments and exams, while educators struggle to detect AI-generated content. This has led to a surge in academic misconduct, as students present work generated by LLMs as their own, without putting in the effort required for learning. As AI tools become more advanced and produce increasingly human-like text, detecting such content becomes more challenging. This development has significantly impacted the academic world, where many educators are finding it difficult to adapt their assessment methods to this challenge. This research first demonstrates how LLMs have increased academic dishonesty, and then reviews state-of-the-art solutions for academic plagiarism in detail. A survey of datasets, algorithms, tools, and evasion strategies for plagiarism detection has been conducted, focusing on how LLMs and AI-generated content (AIGC) detection have affected this area. The survey aims to identify the gaps in existing solutions. Lastly, potential long-term solutions are presented to address the issue of academic plagiarism using LLMs based on AI tools and educational approaches in an ever-changing world.

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

TL;DR

Abstract

Paper Structure (21 sections, 4 figures, 10 tables)

This paper contains 21 sections, 4 figures, 10 tables.

Introduction
The plagiarism problem in Academia
Rise of Large Language Models (LLMs)
Academic Misconduct in the LLMs Era
Existing Solutions to AI-Generated Plagiarism
Plagiarism Detection
Plagiarism Detection After LLMs
AI Generated Content (AIGC) Detection
Open Source Datasets for AIGC Detection
Watermarking Based Approaches
Zero-shot Based Approaches
Training Classifier Based Approaches
Detection Tools
Techniques to evade AIGC Detection Tools
Limitations and Gaps in Current Solutions
...and 6 more sections

Figures (4)

Figure 1: Timeline indicating the release date and parameter of different GPT models by OpenAI.
Figure 2: Diagram demonstrating how ChatGPT and paraphrasing tools can be used to complete assignments.
Figure 3: Example of a ChatGPT generated and human-written text.
Figure 4: Major AIGC Detection Events including the description of top AIGC detection datasets, algorithms, and tools.

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

TL;DR

Abstract

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

Authors

TL;DR

Abstract

Table of Contents

Figures (4)