Table of Contents
Fetching ...

CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps and Vision Models

Zhenyu Xu, Victor S. Sheng

TL;DR

The paper tackles the challenge of distinguishing LLM-generated code from human-written code in educational and research settings. It introduces a structure-preserving representation by converting code into 2D log-probability maps and classifying them with Vision Transformers or a modified ResNet to leverage both content and code layout. The approach demonstrates strong cross-language performance, robustness to several attacks, and favorable efficiency, outperforming traditional detectors such as perturbation- and watermark-based methods. It offers a scalable solution suitable for real-time classroom deployment, highlighting practical implications for maintaining academic integrity in the era of advanced code-generation models.

Abstract

The rise of large language models (LLMs) like ChatGPT has significantly improved automated code generation, enhancing software development efficiency. However, this introduces challenges in academia, particularly in distinguishing between human-written and LLM-generated code, which complicates issues of academic integrity. Existing detection methods, such as pre-trained models and watermarking, face limitations in adaptability and computational efficiency. In this paper, we propose a novel detection method using 2D token probability maps combined with vision models, preserving spatial code structures such as indentation and brackets. By transforming code into log probability matrices and applying vision models like Vision Transformers (ViT) and ResNet, we capture both content and structure for more accurate detection. Our method shows robustness across multiple programming languages and improves upon traditional detectors, offering a scalable and computationally efficient solution for identifying LLM-generated code.

CodeVision: Detecting LLM-Generated Code Using 2D Token Probability Maps and Vision Models

TL;DR

The paper tackles the challenge of distinguishing LLM-generated code from human-written code in educational and research settings. It introduces a structure-preserving representation by converting code into 2D log-probability maps and classifying them with Vision Transformers or a modified ResNet to leverage both content and code layout. The approach demonstrates strong cross-language performance, robustness to several attacks, and favorable efficiency, outperforming traditional detectors such as perturbation- and watermark-based methods. It offers a scalable solution suitable for real-time classroom deployment, highlighting practical implications for maintaining academic integrity in the era of advanced code-generation models.

Abstract

The rise of large language models (LLMs) like ChatGPT has significantly improved automated code generation, enhancing software development efficiency. However, this introduces challenges in academia, particularly in distinguishing between human-written and LLM-generated code, which complicates issues of academic integrity. Existing detection methods, such as pre-trained models and watermarking, face limitations in adaptability and computational efficiency. In this paper, we propose a novel detection method using 2D token probability maps combined with vision models, preserving spatial code structures such as indentation and brackets. By transforming code into log probability matrices and applying vision models like Vision Transformers (ViT) and ResNet, we capture both content and structure for more accurate detection. Our method shows robustness across multiple programming languages and improves upon traditional detectors, offering a scalable and computationally efficient solution for identifying LLM-generated code.
Paper Structure (31 sections, 9 equations, 3 figures, 4 tables)

This paper contains 31 sections, 9 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Model Architectures.
  • Figure 2: Performance and FLOPs of ViT and ResNet Models of Scaling Sizes.
  • Figure 3: Detection Performance Across Code Lengths