Table of Contents
Fetching ...

Fishing for Phishers: Learning-Based Phishing Detection in Ethereum Transactions

Ahod Alghuried, Abdulaziz Alghamdi, Ali Alkinoon, Soohyeon Choi, Manar Mohaisen, David Mohaisen

TL;DR

This work investigates phishing detection in Ethereum transactions by comparing explicit transactional features with implicit graph-derived features. It designs a two-stage evaluation using a Graph Convolutional Network to operate on either feature type and a rich implicit feature set capturing temporal and relational patterns. Results show that implicit, graph-based features substantially improve detection accuracy and phishing recall on a large, imbalanced dataset, whereas explicit features alone underperform. The study highlights the critical role of relational context and feature engineering for robust, adversarially resilient detection and points to future directions such as temporal graphs and attention mechanisms to further enhance performance.

Abstract

Phishing detection on Ethereum has increasingly leveraged advanced machine learning techniques to identify fraudulent transactions. However, limited attention has been given to understanding the effectiveness of feature selection strategies and the role of graph-based models in enhancing detection accuracy. In this paper, we systematically examine these issues by analyzing and contrasting explicit transactional features and implicit graph-based features, both experimentally and analytically. We explore how different feature sets impact the performance of phishing detection models, particularly in the context of Ethereum's transactional network. Additionally, we address key challenges such as class imbalance and dataset composition and their influence on the robustness and precision of detection methods. Our findings demonstrate the advantages and limitations of each feature type, while also providing a clearer understanding of how feature affect model resilience and generalization in adversarial environments.

Fishing for Phishers: Learning-Based Phishing Detection in Ethereum Transactions

TL;DR

This work investigates phishing detection in Ethereum transactions by comparing explicit transactional features with implicit graph-derived features. It designs a two-stage evaluation using a Graph Convolutional Network to operate on either feature type and a rich implicit feature set capturing temporal and relational patterns. Results show that implicit, graph-based features substantially improve detection accuracy and phishing recall on a large, imbalanced dataset, whereas explicit features alone underperform. The study highlights the critical role of relational context and feature engineering for robust, adversarially resilient detection and points to future directions such as temporal graphs and attention mechanisms to further enhance performance.

Abstract

Phishing detection on Ethereum has increasingly leveraged advanced machine learning techniques to identify fraudulent transactions. However, limited attention has been given to understanding the effectiveness of feature selection strategies and the role of graph-based models in enhancing detection accuracy. In this paper, we systematically examine these issues by analyzing and contrasting explicit transactional features and implicit graph-based features, both experimentally and analytically. We explore how different feature sets impact the performance of phishing detection models, particularly in the context of Ethereum's transactional network. Additionally, we address key challenges such as class imbalance and dataset composition and their influence on the robustness and precision of detection methods. Our findings demonstrate the advantages and limitations of each feature type, while also providing a clearer understanding of how feature affect model resilience and generalization in adversarial environments.

Paper Structure

This paper contains 20 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of the Ethereum phishing scam network. Phishing addresses (red nodes) are interspersed among benign addresses (blue nodes), exhibiting similar transactional patterns, thereby complicating detection within the broader network structure.
  • Figure 2: An illustration of the proposed pipeline, integrating explicit and implicit features from the Ethereum network.
  • Figure 3: Illustrating the labeling process. G2 and G3 represent the sender and receiver addresses. R1 and R2 are labeling results. In R1, addresses are first matched against the known phishing list. If a match is found, R2 performs manual verification via Etherscan to ensure labeling accuracy.
  • Figure 4: Construction of a directed Ethereum transaction graph and its transformation into PyTorch Geometric inputs. The resulting data is processed by a GCN to classify addresses as phishing or benign.
  • Figure 5: Top 10 important features based on Random Forest.