MultiCFV: Detecting Control Flow Vulnerabilities in Smart Contracts Leveraging Multimodal Deep Learning
Hongli Peng, Xiaoqi Li, Wenkai Li
TL;DR
MultiCFV tackles smart contract vulnerability and clone detection by fusing control-flow graphs from bytecode, abstract syntax trees from source code, and code comments through a GRU-GCN and BERT/com-extractor pipeline. The approach yields a fixed-size, multimodal feature representation for each contract, enabling accurate detection of erroneous control flow vulnerabilities and efficient clone detection, outperforming existing tools like Slither and Mythril. Experiments across four datasets and ablation studies demonstrate the value of multimodal integration, with strong generalization on a separate vulnerability dataset. Limitations include coarse-grained clone detection at the contract level, with future work aiming for finer-grained, function-level analysis.
Abstract
The introduction of smart contract functionality marks the advent of the blockchain 2.0 era, enabling blockchain technology to support digital currency transactions and complex distributed applications. However, many smart contracts have been found to contain vulnerabilities and errors, leading to the loss of assets within the blockchain. Despite a range of tools that have been developed to identify vulnerabilities in smart contracts at the source code or bytecode level, most rely on a single modality, reducing performance, accuracy, and limited generalization capabilities. This paper proposes a multimodal deep learning approach, MultiCFV, which is designed specifically to analyze and detect erroneous control flow vulnerability, as well as identify code clones in smart contracts. Bytecode is generated from source code to construct control flow graphs, with graph embedding techniques extracting graph features. Abstract syntax trees are used to obtain syntax features, while code comments capture key commentary words and comment features. These three feature vectors are fused to create a database for code inspection, which is used to detect similar code and identify contract vulnerabilities. Experimental results demonstrate our method effectively combines structural, syntactic, and semantic information, improving the accuracy of smart contract vulnerability detection and clone detection.
