SC-Bench: A Large-Scale Dataset for Smart Contract Auditing

Shihao Xia; Mengting He; Linhai Song; Yiying Zhang

SC-Bench: A Large-Scale Dataset for Smart Contract Auditing

Shihao Xia, Mengting He, Linhai Song, Yiying Zhang

TL;DR

SC-Bench introduces the first large-scale dataset for automated smart-contract auditing, combining 5,377 real Ethereum contracts with 15,975 ERC-rule violations (139 real, 15,836 injected) to benchmark ML methods. Using GPT-4 with full ERC-rule prompts and with oracle-like rule-site information, the study reveals very low baseline detection ($0.9\%$) but notable gains when providing targeted oracle data ($22.9\%$), indicating a substantial opportunity for improvement in ML-based auditing. The dataset integrates real violations and systematically injected errors across ERC20, ERC721, and ERC1155 rules, and releases accompanying code and injection scripts to foster research beyond smart contracts, including API usage rule checks. Overall, SC-Bench demonstrates both the potential of ML-augmented auditing and the current gaps, underscoring the need for broader ERC coverage and more sophisticated prompting or models to advance automated smart-contract safety and compliance.

Abstract

There is a huge demand to ensure the compliance of smart contracts listed on blockchain platforms to safety and economic standards. Today, manual efforts in the form of auditing are commonly used to achieve this goal. ML-based automated techniques have the promise to alleviate human efforts and the resulting monetary costs. However, unlike other domains where ML techniques have had huge successes, no systematic ML techniques have been proposed or applied to smart contract auditing. We present SC-Bench, the first dataset for automated smart-contract auditing research. SC-Bench consists of 5,377 real-world smart contracts running on Ethereum, a widely used blockchain platform, and 15,975 violations of standards on Ehereum called ERCs. Out of these violations, 139 are real violations programmers made. The remaining are errors we systematically injected to reflect the violations of different ERC rules. We evaluate SC-Bench using GPT-4 by prompting it with both the contracts and ERC rules. In addition, we manually identify each violated rule and the corresponding code site (i.e., oracle) and prompt GPT-4 with the information asking for a True-or-False question. Our results show that without the oracle, GPT-4 can only detect 0.9% violations, and with the oracle, it detects 22.9% violations. These results show the potential room for improvement in ML-based techniques for smart-contract auditing.

SC-Bench: A Large-Scale Dataset for Smart Contract Auditing

TL;DR

) but notable gains when providing targeted oracle data (

), indicating a substantial opportunity for improvement in ML-based auditing. The dataset integrates real violations and systematically injected errors across ERC20, ERC721, and ERC1155 rules, and releases accompanying code and injection scripts to foster research beyond smart contracts, including API usage rule checks. Overall, SC-Bench demonstrates both the potential of ML-augmented auditing and the current gaps, underscoring the need for broader ERC coverage and more sophisticated prompting or models to advance automated smart-contract safety and compliance.

Abstract

Paper Structure (13 sections, 9 figures, 4 tables)

This paper contains 13 sections, 9 figures, 4 tables.

Introduction
Background
Ethereum and Smart Contracts
Ethereum Request for Comment (ERC)
ERC Rule Violations
Today's Auditing Practices
SC-Bench
Construction
Dataset Summary
Evaluation
Methodology
Experimental Results
Discussion and Conclusion

Figures (9)

Figure 1: An ERC20 rule violation that can be exploited to steal tokens. (The code is simplified for illustration purpose.)
Figure 2: Violation injection of a condition-check rule. (Line 9 is deleted to perform the violation injection.)
Figure 3: Violation injection of a return rule. (Line 5 is replaced with line 6 to perform the violation injection.)
Figure 4: Average contract size.
Figure 5: Average error number in a contract.
...and 4 more figures

SC-Bench: A Large-Scale Dataset for Smart Contract Auditing

TL;DR

Abstract

SC-Bench: A Large-Scale Dataset for Smart Contract Auditing

Authors

TL;DR

Abstract

Table of Contents

Figures (9)