Table of Contents
Fetching ...

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

Shushanta Pudasaini, Luis Miralles-Pechuán, David Lillis, Marisa Llorens Salvador

TL;DR

AI-generated content from large language models threatens academic integrity by enabling easy completion of work. The paper surveys plagiarism and AIGC detection methods, datasets, tools, and evasion strategies, highlighting the evolving landscape post-LLMs. It discusses limitations such as the lack of standardized benchmarks and the ease with which detectors can be bypassed, and advocates non-technical educational strategies to complement technical solutions. The findings emphasize the need for standardized benchmarks, multi-model detection systems, and explainability to support policy and practical implementations in academia.

Abstract

The rise of Large Language Models (LLMs) such as ChatGPT and Gemini has posed new challenges for the academic community. With the help of these models, students can easily complete their assignments and exams, while educators struggle to detect AI-generated content. This has led to a surge in academic misconduct, as students present work generated by LLMs as their own, without putting in the effort required for learning. As AI tools become more advanced and produce increasingly human-like text, detecting such content becomes more challenging. This development has significantly impacted the academic world, where many educators are finding it difficult to adapt their assessment methods to this challenge. This research first demonstrates how LLMs have increased academic dishonesty, and then reviews state-of-the-art solutions for academic plagiarism in detail. A survey of datasets, algorithms, tools, and evasion strategies for plagiarism detection has been conducted, focusing on how LLMs and AI-generated content (AIGC) detection have affected this area. The survey aims to identify the gaps in existing solutions. Lastly, potential long-term solutions are presented to address the issue of academic plagiarism using LLMs based on AI tools and educational approaches in an ever-changing world.

Survey on Plagiarism Detection in Large Language Models: The Impact of ChatGPT and Gemini on Academic Integrity

TL;DR

AI-generated content from large language models threatens academic integrity by enabling easy completion of work. The paper surveys plagiarism and AIGC detection methods, datasets, tools, and evasion strategies, highlighting the evolving landscape post-LLMs. It discusses limitations such as the lack of standardized benchmarks and the ease with which detectors can be bypassed, and advocates non-technical educational strategies to complement technical solutions. The findings emphasize the need for standardized benchmarks, multi-model detection systems, and explainability to support policy and practical implementations in academia.

Abstract

The rise of Large Language Models (LLMs) such as ChatGPT and Gemini has posed new challenges for the academic community. With the help of these models, students can easily complete their assignments and exams, while educators struggle to detect AI-generated content. This has led to a surge in academic misconduct, as students present work generated by LLMs as their own, without putting in the effort required for learning. As AI tools become more advanced and produce increasingly human-like text, detecting such content becomes more challenging. This development has significantly impacted the academic world, where many educators are finding it difficult to adapt their assessment methods to this challenge. This research first demonstrates how LLMs have increased academic dishonesty, and then reviews state-of-the-art solutions for academic plagiarism in detail. A survey of datasets, algorithms, tools, and evasion strategies for plagiarism detection has been conducted, focusing on how LLMs and AI-generated content (AIGC) detection have affected this area. The survey aims to identify the gaps in existing solutions. Lastly, potential long-term solutions are presented to address the issue of academic plagiarism using LLMs based on AI tools and educational approaches in an ever-changing world.
Paper Structure (21 sections, 4 figures, 10 tables)

This paper contains 21 sections, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Timeline indicating the release date and parameter of different GPT models by OpenAI.
  • Figure 2: Diagram demonstrating how ChatGPT and paraphrasing tools can be used to complete assignments.
  • Figure 3: Example of a ChatGPT generated and human-written text.
  • Figure 4: Major AIGC Detection Events including the description of top AIGC detection datasets, algorithms, and tools.