Table of Contents
Fetching ...

Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum

Cyrus Malik, Josef Bajada, Joshua Ellul

TL;DR

The paper addresses the challenge of assessing smart contract reputability using single-source data by introducing a multimodal framework that fuses AI-based code analysis (via GAN-augmented opcode embeddings) with dynamic transactional data. It employs boosting ensemble methods on code features and a convolutional autoencoder to fuse modalities for robust anomaly detection, achieving high illicit-contract recall and overall accuracy. Key contributions include GAN-based opcode augmentation, a CAE-based multimodal fusion approach, and publicly available datasets to support reproducibility, with results showing improved detection of evolving reputability shifts. The work advances proactive risk mitigation and blockchain security by enabling near-real-time, holistic evaluations of contract behaviour in Ethereum ecosystems.

Abstract

The evaluation of smart contract reputability is essential to foster trust in decentralized ecosystems. However, existing methods that rely solely on code analysis or transactional data, offer limited insight into evolving trustworthiness. We propose a multimodal data fusion framework that integrates code features with transactional data to enhance reputability prediction. Our framework initially focuses on AI-based code analysis, utilizing GAN-augmented opcode embeddings to address class imbalance, achieving 97.67% accuracy and a recall of 0.942 in detecting illicit contracts, surpassing traditional oversampling methods. This forms the crux of a reputability-centric fusion strategy, where combining code and transactional data improves recall by 7.25% over single-source models, demonstrating robust performance across validation sets. By providing a holistic view of smart contract behaviour, our approach enhances the model's ability to assess reputability, identify fraudulent activities, and predict anomalous patterns. These capabilities contribute to more accurate reputability assessments, proactive risk mitigation, and enhanced blockchain security.

Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum

TL;DR

The paper addresses the challenge of assessing smart contract reputability using single-source data by introducing a multimodal framework that fuses AI-based code analysis (via GAN-augmented opcode embeddings) with dynamic transactional data. It employs boosting ensemble methods on code features and a convolutional autoencoder to fuse modalities for robust anomaly detection, achieving high illicit-contract recall and overall accuracy. Key contributions include GAN-based opcode augmentation, a CAE-based multimodal fusion approach, and publicly available datasets to support reproducibility, with results showing improved detection of evolving reputability shifts. The work advances proactive risk mitigation and blockchain security by enabling near-real-time, holistic evaluations of contract behaviour in Ethereum ecosystems.

Abstract

The evaluation of smart contract reputability is essential to foster trust in decentralized ecosystems. However, existing methods that rely solely on code analysis or transactional data, offer limited insight into evolving trustworthiness. We propose a multimodal data fusion framework that integrates code features with transactional data to enhance reputability prediction. Our framework initially focuses on AI-based code analysis, utilizing GAN-augmented opcode embeddings to address class imbalance, achieving 97.67% accuracy and a recall of 0.942 in detecting illicit contracts, surpassing traditional oversampling methods. This forms the crux of a reputability-centric fusion strategy, where combining code and transactional data improves recall by 7.25% over single-source models, demonstrating robust performance across validation sets. By providing a holistic view of smart contract behaviour, our approach enhances the model's ability to assess reputability, identify fraudulent activities, and predict anomalous patterns. These capabilities contribute to more accurate reputability assessments, proactive risk mitigation, and enhanced blockchain security.

Paper Structure

This paper contains 16 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: KDE plot comparing real and GAN-generated opcode embeddings.
  • Figure 2: Reconstruction Error Distribution for the Multimodal CAE.
  • Figure 3: t-SNE visualizations of contract-level latent representations for the Transaction-Only CAE (top) and Multimodal CAE (bottom). The Transaction-Only CAE shows some overlap between reputable (blue) and illicit (red) contracts, while the Multimodal CAE forms more distinct clusters, indicating improved separation with multimodal data fusion.