Table of Contents
Fetching ...

Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow

Jiachi Chen, Chong Chen, Jiang Hu, John Grundy, Yanlin Wang, Ting Chen, Zibin Zheng

TL;DR

The paper tackles the security risks of smart-contract code snippets shared on Stack Overflow and presents SOChecker, a two-stage system that couples a fine-tuned Llama2-based code completer with a symbolic-execution–driven vulnerability detector to analyze incomplete code. It builds a large dataset of Solidity snippets from SO and a broader corpus of Ethereum contracts to train and evaluate the system, demonstrating superior performance over GPT-3.5/4 and traditional vulnerability tools on fragmentary code. The approach includes CFG-based pruning to focus analysis on the original snippets, nine DASP10 vulnerability patterns, and LoRA-based fine-tuning, achieving a real-snippet F1 of 68.2% and a complete-code F1 of 83.4% in evaluations. The work highlights the practical impact of secure handling of community-sourced code and provides datasets and code to encourage further research in secure analysis of Q&A code contributions.

Abstract

Smart contract developers frequently seek solutions to developmental challenges on Q&A platforms such as Stack Overflow (SO). Although community responses often provide viable solutions, the embedded code snippets can also contain hidden vulnerabilities. Integrating such code directly into smart contracts may make them susceptible to malicious attacks. We conducted an online survey and received 74 responses from smart contract developers. The results of this survey indicate that the majority (86.4%) of participants do not sufficiently consider security when reusing SO code snippets. Despite the existence of various tools designed to detect vulnerabilities in smart contracts, these tools are typically developed for analyzing fully-completed smart contracts and thus are ineffective for analyzing typical code snippets as found on SO. We introduce SOChecker, the first tool designed to identify potential vulnerabilities in incomplete SO smart contract code snippets. SOChecker first leverages a fine-tuned Llama2 model for code completion, followed by the application of symbolic execution methods for vulnerability detection. Our experimental results, derived from a dataset comprising 897 code snippets collected from smart contract-related SO posts, demonstrate that SOChecker achieves an F1 score of 68.2%, greatly surpassing GPT-3.5 and GPT-4 (20.9% and 33.2% F1 Scores respectively). Our findings underscore the need to improve the security of code snippets from Q&A websites.

Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow

TL;DR

The paper tackles the security risks of smart-contract code snippets shared on Stack Overflow and presents SOChecker, a two-stage system that couples a fine-tuned Llama2-based code completer with a symbolic-execution–driven vulnerability detector to analyze incomplete code. It builds a large dataset of Solidity snippets from SO and a broader corpus of Ethereum contracts to train and evaluate the system, demonstrating superior performance over GPT-3.5/4 and traditional vulnerability tools on fragmentary code. The approach includes CFG-based pruning to focus analysis on the original snippets, nine DASP10 vulnerability patterns, and LoRA-based fine-tuning, achieving a real-snippet F1 of 68.2% and a complete-code F1 of 83.4% in evaluations. The work highlights the practical impact of secure handling of community-sourced code and provides datasets and code to encourage further research in secure analysis of Q&A code contributions.

Abstract

Smart contract developers frequently seek solutions to developmental challenges on Q&A platforms such as Stack Overflow (SO). Although community responses often provide viable solutions, the embedded code snippets can also contain hidden vulnerabilities. Integrating such code directly into smart contracts may make them susceptible to malicious attacks. We conducted an online survey and received 74 responses from smart contract developers. The results of this survey indicate that the majority (86.4%) of participants do not sufficiently consider security when reusing SO code snippets. Despite the existence of various tools designed to detect vulnerabilities in smart contracts, these tools are typically developed for analyzing fully-completed smart contracts and thus are ineffective for analyzing typical code snippets as found on SO. We introduce SOChecker, the first tool designed to identify potential vulnerabilities in incomplete SO smart contract code snippets. SOChecker first leverages a fine-tuned Llama2 model for code completion, followed by the application of symbolic execution methods for vulnerability detection. Our experimental results, derived from a dataset comprising 897 code snippets collected from smart contract-related SO posts, demonstrate that SOChecker achieves an F1 score of 68.2%, greatly surpassing GPT-3.5 and GPT-4 (20.9% and 33.2% F1 Scores respectively). Our findings underscore the need to improve the security of code snippets from Q&A websites.
Paper Structure (30 sections, 8 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: A post on Stack Overflow related to Solidity.
  • Figure 2: The way for participants to conduct security analysis on the code on Stack Overflow.
  • Figure 3: The level of understanding of smart contract vulnerabilities among participants (left) and their belief in the necessity of detecting these vulnerabilities on SO (right).
  • Figure 4: The overall workflow of SOChecker.
  • Figure 5: An example of constructing fine-tuning data.
  • ...and 3 more figures