Table of Contents
Fetching ...

MTVHunter: Smart Contracts Vulnerability Detection Based on Multi-Teacher Knowledge Translation

Guokai Sun, Yuan Zhuang, Shuo Zhang, Xiaoyu Feng, Zhenguang Liu, Liguo Zhang

TL;DR

MTVHunter tackles vulnerability detection in smart contract bytecode by introducing a multi-teacher framework that denoises instructions and restores missing semantics before detection. The Instruction Denoising Teacher leverages Abstract Vulnerability Patterns to suppress noise, while the Semantic Complementary Teacher uses neuron distillation to map source-code semantics into bytecode representations. A Two-Stage Student integrates sequence and graph features to predict vulnerabilities across four common types using a large real-world dataset. Results show state-of-the-art accuracy against traditional tools and neural baselines, validating the efficacy of targeted denoising and cross-modality semantic transfer for practical smart-contract security auditing.

Abstract

Smart contracts, closely intertwined with cryptocurrency transactions, have sparked widespread concerns about considerable financial losses of security issues. To counteract this, a variety of tools have been developed to identify vulnerability in smart contract. However, they fail to overcome two challenges at the same time when faced with smart contract bytecode: (i) strong interference caused by enormous non-relevant instructions; (ii) missing semantics of bytecode due to incomplete data and control flow dependencies. In this paper, we propose a multi-teacher based bytecode vulnerability detection method, namely Multi-Teacher Vulnerability Hunter (MTVHunter), which delivers effective denoising and missing semantic to bytecode under multi-teacher guidance. Specifically, we first propose an instruction denoising teacher to eliminate noise interference by abstract vulnerability pattern and further reflect in contract embeddings. Secondly, we design a novel semantic complementary teacher with neuron distillation, which effectively extracts necessary semantic from source code to replenish the bytecode. Particularly, the proposed neuron distillation accelerate this semantic filling by turning the knowledge transition into a regression task. We conduct experiments on 229,178 real-world smart contracts that concerns four types of common vulnerabilities. Extensive experiments show MTVHunter achieves significantly performance gains over state-of-the-art approaches.

MTVHunter: Smart Contracts Vulnerability Detection Based on Multi-Teacher Knowledge Translation

TL;DR

MTVHunter tackles vulnerability detection in smart contract bytecode by introducing a multi-teacher framework that denoises instructions and restores missing semantics before detection. The Instruction Denoising Teacher leverages Abstract Vulnerability Patterns to suppress noise, while the Semantic Complementary Teacher uses neuron distillation to map source-code semantics into bytecode representations. A Two-Stage Student integrates sequence and graph features to predict vulnerabilities across four common types using a large real-world dataset. Results show state-of-the-art accuracy against traditional tools and neural baselines, validating the efficacy of targeted denoising and cross-modality semantic transfer for practical smart-contract security auditing.

Abstract

Smart contracts, closely intertwined with cryptocurrency transactions, have sparked widespread concerns about considerable financial losses of security issues. To counteract this, a variety of tools have been developed to identify vulnerability in smart contract. However, they fail to overcome two challenges at the same time when faced with smart contract bytecode: (i) strong interference caused by enormous non-relevant instructions; (ii) missing semantics of bytecode due to incomplete data and control flow dependencies. In this paper, we propose a multi-teacher based bytecode vulnerability detection method, namely Multi-Teacher Vulnerability Hunter (MTVHunter), which delivers effective denoising and missing semantic to bytecode under multi-teacher guidance. Specifically, we first propose an instruction denoising teacher to eliminate noise interference by abstract vulnerability pattern and further reflect in contract embeddings. Secondly, we design a novel semantic complementary teacher with neuron distillation, which effectively extracts necessary semantic from source code to replenish the bytecode. Particularly, the proposed neuron distillation accelerate this semantic filling by turning the knowledge transition into a regression task. We conduct experiments on 229,178 real-world smart contracts that concerns four types of common vulnerabilities. Extensive experiments show MTVHunter achieves significantly performance gains over state-of-the-art approaches.

Paper Structure

This paper contains 19 sections, 8 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: A high-level overview of MTVHunter. (1) CFG generator, which implements a transition between the source code and the Control Flow Graph. (2) Instruction denoising teacher, which eliminates the interference of noise in the CFG. (3) Semantic complementary teacher, which provides the missing semantic to bytecode by neuron distillation. (4) Two-stage student, which consists of a instruction sequence extractor and a graph feature extractor.
  • Figure 2: The overall process of IDT. (a) The snippet of CFG for vulnerable function Withdraw. (b) The matching of AVP. (c) Node scoring Mechanism.
  • Figure 3: Illustration of neuron distillation mapping source code to missing semantic features.
  • Figure 4: Performance comparison (%) for different combinations of $\alpha_{sim}$ and $\beta_{pre}$ under the four vulnerabilities.
  • Figure 5: Case study on the interpretability of AVP.