Table of Contents
Fetching ...

Same Same But Different: Preventing Refactoring Attacks on Software Plagiarism Detection

Robin Maisch, Larissa Schmid, Timur Sağlam, Nils Niehues

TL;DR

Plagiarism detection in programming education is increasingly challenged by refactoring-based obfuscation and AI-assisted transformations. The authors introduce Nocte, a modular framework that converts programs to code property graphs and applies graph-transformations to normalize semantic structure before tokenization, enabling existing detectors to better identify plagiarized code. Their evaluation across real-world datasets shows Nocte significantly improves resilience against insertion- and refactoring-based obfuscation, while AI-based obfuscation remains a difficult frontier and AI-generated code benefits from complementary strategies. The work provides an extensible path for strengthening academic integrity in coding assignments with practical applicability to current detection pipelines.

Abstract

Plagiarism detection in programming education faces growing challenges due to increasingly sophisticated obfuscation techniques, particularly automated refactoring-based attacks. While code plagiarism detection systems used in education practice are resilient against basic obfuscation, they struggle against structural modifications that preserve program behavior, especially caused by refactoring-based obfuscation. This paper presents a novel and extensible framework that enhances state-of-the-art detectors by leveraging code property graphs and graph transformations to counteract refactoring-based obfuscation. Our comprehensive evaluation of real-world student submissions, obfuscated using both algorithmic and AI-based obfuscation attacks, demonstrates a significant improvement in detecting plagiarized code.

Same Same But Different: Preventing Refactoring Attacks on Software Plagiarism Detection

TL;DR

Plagiarism detection in programming education is increasingly challenged by refactoring-based obfuscation and AI-assisted transformations. The authors introduce Nocte, a modular framework that converts programs to code property graphs and applies graph-transformations to normalize semantic structure before tokenization, enabling existing detectors to better identify plagiarized code. Their evaluation across real-world datasets shows Nocte significantly improves resilience against insertion- and refactoring-based obfuscation, while AI-based obfuscation remains a difficult frontier and AI-generated code benefits from complementary strategies. The work provides an extensible path for strengthening academic integrity in coding assignments with practical applicability to current detection pipelines.

Abstract

Plagiarism detection in programming education faces growing challenges due to increasingly sophisticated obfuscation techniques, particularly automated refactoring-based attacks. While code plagiarism detection systems used in education practice are resilient against basic obfuscation, they struggle against structural modifications that preserve program behavior, especially caused by refactoring-based obfuscation. This paper presents a novel and extensible framework that enhances state-of-the-art detectors by leveraging code property graphs and graph transformations to counteract refactoring-based obfuscation. Our comprehensive evaluation of real-world student submissions, obfuscated using both algorithmic and AI-based obfuscation attacks, demonstrates a significant improvement in detecting plagiarized code.

Paper Structure

This paper contains 45 sections, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Example of a refactoring transformation with a corresponding transformation template containing operations on how to refactor the source graph pattern $S$ into the target graph pattern $T$. Each node pattern shows its type (bold) and its role (italicized).
  • Figure 2:
  • Figure 3: Similarities for unrelated human programs and plagiarism instances based on insertion-based obfuscation.
  • Figure 4: Similarities for unrelated human programs and plagiarism instances based on refactoring-based obfuscation.
  • Figure 5: Similarities for unrelated human programs and plagiarism instances based on AI-based obfuscation.
  • ...and 1 more figures