Table of Contents
Fetching ...

SmartBugBert: BERT-Enhanced Vulnerability Detection for Smart Contract Bytecode

Jiuyang Bu, Wenkai Li, Zongwei Li, Zeng Zhang, Xiaoqi Li

TL;DR

This work tackles vulnerability detection for smart contracts when source code is unavailable by performing bytecode-level analysis. It introduces SmartBugBert, a pipeline that decompiles bytecode into optimized opcode sequences, reconstructs bytecode CFGs, and extracts vulnerability-relevant fragments, which are then processed by a fine-tuned BERT model and fused with semantic TF-IDF features before classification with LightGBM. The approach achieves high accuracy (about 91% F1) on 6,157 Ethereum contracts across four vulnerability types (TOV, ACV, SDV, TDV) and runs orders of magnitude faster than symbolic tools. Ablation studies demonstrate the value of combining CFG information with semantic features, validating the multi-view methodology for robust bytecode-level vulnerability detection with practical scalability.

Abstract

Smart contracts deployed on blockchain platforms are vulnerable to various security vulnerabilities. However, only a small number of Ethereum contracts have released their source code, so vulnerability detection at the bytecode level is crucial. This paper introduces SmartBugBert, a novel approach that combines BERT-based deep learning with control flow graph (CFG) analysis to detect vulnerabilities directly from bytecode. Our method first decompiles smart contract bytecode into optimized opcode sequences, extracts semantic features using TF-IDF, constructs control flow graphs to capture execution logic, and isolates vulnerable CFG fragments for targeted analysis. By integrating both semantic and structural information through a fine-tuned BERT model and LightGBM classifier, our approach effectively identifies four critical vulnerability types: transaction-ordering, access control, self-destruct, and timestamp dependency vulnerabilities. Experimental evaluation on 6,157 Ethereum smart contracts demonstrates that SmartBugBert achieves 90.62% precision, 91.76% recall, and 91.19% F1-score, significantly outperforming existing detection methods. Ablation studies confirm that the combination of semantic features with CFG information substantially enhances detection performance. Furthermore, our approach maintains efficient detection speed (0.14 seconds per contract), making it practical for large-scale vulnerability assessment.

SmartBugBert: BERT-Enhanced Vulnerability Detection for Smart Contract Bytecode

TL;DR

This work tackles vulnerability detection for smart contracts when source code is unavailable by performing bytecode-level analysis. It introduces SmartBugBert, a pipeline that decompiles bytecode into optimized opcode sequences, reconstructs bytecode CFGs, and extracts vulnerability-relevant fragments, which are then processed by a fine-tuned BERT model and fused with semantic TF-IDF features before classification with LightGBM. The approach achieves high accuracy (about 91% F1) on 6,157 Ethereum contracts across four vulnerability types (TOV, ACV, SDV, TDV) and runs orders of magnitude faster than symbolic tools. Ablation studies demonstrate the value of combining CFG information with semantic features, validating the multi-view methodology for robust bytecode-level vulnerability detection with practical scalability.

Abstract

Smart contracts deployed on blockchain platforms are vulnerable to various security vulnerabilities. However, only a small number of Ethereum contracts have released their source code, so vulnerability detection at the bytecode level is crucial. This paper introduces SmartBugBert, a novel approach that combines BERT-based deep learning with control flow graph (CFG) analysis to detect vulnerabilities directly from bytecode. Our method first decompiles smart contract bytecode into optimized opcode sequences, extracts semantic features using TF-IDF, constructs control flow graphs to capture execution logic, and isolates vulnerable CFG fragments for targeted analysis. By integrating both semantic and structural information through a fine-tuned BERT model and LightGBM classifier, our approach effectively identifies four critical vulnerability types: transaction-ordering, access control, self-destruct, and timestamp dependency vulnerabilities. Experimental evaluation on 6,157 Ethereum smart contracts demonstrates that SmartBugBert achieves 90.62% precision, 91.76% recall, and 91.19% F1-score, significantly outperforming existing detection methods. Ablation studies confirm that the combination of semantic features with CFG information substantially enhances detection performance. Furthermore, our approach maintains efficient detection speed (0.14 seconds per contract), making it practical for large-scale vulnerability assessment.

Paper Structure

This paper contains 17 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Source code of Etherscan smart contract
  • Figure 2: SmartBugBert Framework
  • Figure 3: Optimized Opcode Sequence
  • Figure 4: Control Flow Graph at the Bytecode Level
  • Figure 5: Control Flow Graph
  • ...and 2 more figures