Same Same But Different: Preventing Refactoring Attacks on Software Plagiarism Detection
Robin Maisch, Larissa Schmid, Timur Sağlam, Nils Niehues
TL;DR
Plagiarism detection in programming education is increasingly challenged by refactoring-based obfuscation and AI-assisted transformations. The authors introduce Nocte, a modular framework that converts programs to code property graphs and applies graph-transformations to normalize semantic structure before tokenization, enabling existing detectors to better identify plagiarized code. Their evaluation across real-world datasets shows Nocte significantly improves resilience against insertion- and refactoring-based obfuscation, while AI-based obfuscation remains a difficult frontier and AI-generated code benefits from complementary strategies. The work provides an extensible path for strengthening academic integrity in coding assignments with practical applicability to current detection pipelines.
Abstract
Plagiarism detection in programming education faces growing challenges due to increasingly sophisticated obfuscation techniques, particularly automated refactoring-based attacks. While code plagiarism detection systems used in education practice are resilient against basic obfuscation, they struggle against structural modifications that preserve program behavior, especially caused by refactoring-based obfuscation. This paper presents a novel and extensible framework that enhances state-of-the-art detectors by leveraging code property graphs and graph transformations to counteract refactoring-based obfuscation. Our comprehensive evaluation of real-world student submissions, obfuscated using both algorithmic and AI-based obfuscation attacks, demonstrates a significant improvement in detecting plagiarized code.
