Table of Contents
Fetching ...

Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection

Jango Zhang

TL;DR

This paper presents SimilarGPT, a vulnerability-detection framework for smart contracts that combines GPT-based semantic analysis with Code-based Similarity Checking (CBSC) to exploit code reuse in the Ethereum ecosystem. It introduces a topology-aware detection sequence and a Socratic method to mitigate LLM hallucinations, achieving higher recall and lower false positives than baselines. Empirical results on real-world vulnerabilities and Solodit data show SimilarGPT detects more vulnerabilities (8/13) and reduces false positives (to $12\%$) while improving recall via CBSC (38 TP vs 20 without CBSC) and maintaining practical precision. The work demonstrates a scalable approach to secure DeFi code by leveraging a comprehensive reference codebase and multi-agent reasoning, with potential for real-time updates from third-party packages.

Abstract

With the rapid growth of blockchain technology, smart contracts are now crucial to Decentralized Finance (DeFi) applications. Effective vulnerability detection is vital for securing these contracts against hackers and enhancing the accuracy and efficiency of security audits. In this paper, we present SimilarGPT, a unique vulnerability identification tool for smart contract, which combines Generative Pretrained Transformer (GPT) models with Code-based similarity checking methods. The main concept of the SimilarGPT tool is to measure the similarity between the code under inspection and the secure code from third-party libraries. To identify potential vulnerabilities, we connect the semantic understanding capability of large language models (LLMs) with Code-based similarity checking techniques. We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection. Through analysis of code reuse patterns in smart contracts, we compile and process extensive third-party library code to establish a comprehensive reference codebase. Then, we utilize LLM to conduct an indepth analysis of similar codes to identify and explain potential vulnerabilities in the codes. The experimental findings indicate that SimilarGPT excels in detecting vulnerabilities in smart contracts, particularly in missed detections and minimizing false positives.

Combining GPT and Code-Based Similarity Checking for Effective Smart Contract Vulnerability Detection

TL;DR

This paper presents SimilarGPT, a vulnerability-detection framework for smart contracts that combines GPT-based semantic analysis with Code-based Similarity Checking (CBSC) to exploit code reuse in the Ethereum ecosystem. It introduces a topology-aware detection sequence and a Socratic method to mitigate LLM hallucinations, achieving higher recall and lower false positives than baselines. Empirical results on real-world vulnerabilities and Solodit data show SimilarGPT detects more vulnerabilities (8/13) and reduces false positives (to ) while improving recall via CBSC (38 TP vs 20 without CBSC) and maintaining practical precision. The work demonstrates a scalable approach to secure DeFi code by leveraging a comprehensive reference codebase and multi-agent reasoning, with potential for real-time updates from third-party packages.

Abstract

With the rapid growth of blockchain technology, smart contracts are now crucial to Decentralized Finance (DeFi) applications. Effective vulnerability detection is vital for securing these contracts against hackers and enhancing the accuracy and efficiency of security audits. In this paper, we present SimilarGPT, a unique vulnerability identification tool for smart contract, which combines Generative Pretrained Transformer (GPT) models with Code-based similarity checking methods. The main concept of the SimilarGPT tool is to measure the similarity between the code under inspection and the secure code from third-party libraries. To identify potential vulnerabilities, we connect the semantic understanding capability of large language models (LLMs) with Code-based similarity checking techniques. We propose optimizing the detection sequence using topological ordering to enhance logical coherence and reduce false positives during detection. Through analysis of code reuse patterns in smart contracts, we compile and process extensive third-party library code to establish a comprehensive reference codebase. Then, we utilize LLM to conduct an indepth analysis of similar codes to identify and explain potential vulnerabilities in the codes. The experimental findings indicate that SimilarGPT excels in detecting vulnerabilities in smart contracts, particularly in missed detections and minimizing false positives.

Paper Structure

This paper contains 19 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An overview of SimilarGPT, green blocks indicating GPT works and green blocks suggesting code similar analysis.
  • Figure 2: The Redacted Cartel exploit
  • Figure 3: transferFrom function in Openzeppelin's ERC20 contract
  • Figure 4: Detection results of the transferFrom function in Fig. \ref{['fig:ERC20']}
  • Figure 5: Comparing SimilarGPT with the traditional integration models.