Table of Contents
Fetching ...

Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models

Majd Soud, Waltteri Nuutinen, Grischa Liebel

TL;DR

The Soley method, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models, outperforms existing methods in automatically identifying logic vulnerabilities and the efficacy of LLMs in this task was evident without requiring extensive feature engineering.

Abstract

Modern blockchain, such as Ethereum, supports the deployment and execution of so-called smart contracts, autonomous digital programs with significant value of cryptocurrency. Executing smart contracts requires gas costs paid by users, which define the limits of the contract's execution. Logic vulnerabilities in smart contracts can lead to financial losses, and are often the root cause of high-impact cyberattacks. Our objective is threefold: (i) empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub, (ii) introduce Soley, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models (LLMs), and (iii) examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios. We obtained smart contracts and related code changes from GitHub. To address the first and third objectives, we qualitatively investigated available logic vulnerabilities using an open coding method. We identified these vulnerabilities and their mitigation strategies. For the second objective, we extracted various logic vulnerabilities, applied preprocessing techniques, and implemented and trained the proposed Soley model. We evaluated Soley along with the performance of various LLMs and compared the results with the state-of-the-art baseline on the task of logic vulnerability detection. From our analysis, we identified nine novel logic vulnerabilities, extending existing taxonomies with these vulnerabilities. Furthermore, we introduced several mitigation strategies extracted from observed developer modifications in real-world scenarios. Our Soley method outperforms existing methods in automatically identifying logic vulnerabilities. Interestingly, the efficacy of LLMs in this task was evident without requiring extensive feature engineering.

Soley: Identification and Automated Detection of Logic Vulnerabilities in Ethereum Smart Contracts Using Large Language Models

TL;DR

The Soley method, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models, outperforms existing methods in automatically identifying logic vulnerabilities and the efficacy of LLMs in this task was evident without requiring extensive feature engineering.

Abstract

Modern blockchain, such as Ethereum, supports the deployment and execution of so-called smart contracts, autonomous digital programs with significant value of cryptocurrency. Executing smart contracts requires gas costs paid by users, which define the limits of the contract's execution. Logic vulnerabilities in smart contracts can lead to financial losses, and are often the root cause of high-impact cyberattacks. Our objective is threefold: (i) empirically investigate logic vulnerabilities in real-world smart contracts extracted from code changes on GitHub, (ii) introduce Soley, an automated method for detecting logic vulnerabilities in smart contracts, leveraging Large Language Models (LLMs), and (iii) examine mitigation strategies employed by smart contract developers to address these vulnerabilities in real-world scenarios. We obtained smart contracts and related code changes from GitHub. To address the first and third objectives, we qualitatively investigated available logic vulnerabilities using an open coding method. We identified these vulnerabilities and their mitigation strategies. For the second objective, we extracted various logic vulnerabilities, applied preprocessing techniques, and implemented and trained the proposed Soley model. We evaluated Soley along with the performance of various LLMs and compared the results with the state-of-the-art baseline on the task of logic vulnerability detection. From our analysis, we identified nine novel logic vulnerabilities, extending existing taxonomies with these vulnerabilities. Furthermore, we introduced several mitigation strategies extracted from observed developer modifications in real-world scenarios. Our Soley method outperforms existing methods in automatically identifying logic vulnerabilities. Interestingly, the efficacy of LLMs in this task was evident without requiring extensive feature engineering.

Paper Structure

This paper contains 42 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of Our Data Collection and Qualitative Analysis
  • Figure 2: Overview of Our Code Preprocessing and Model Construction
  • Figure 3: Confusion matrix for the classification results of Sóley. Note: RE refers to Reentrancy, UL to Uninitialized Local Variables, CLP to Recursive Calls in Loops, LLC to Incorrect Low-Level Calls, LE to Locked Ether, and IE to Incorrect Equality Check.
  • Figure 4: Precision rates across epochs for selected vulnerabilities.
  • Figure 5: Recall rates across epochs for selected vulnerabilities.
  • ...and 1 more figures