Table of Contents
Fetching ...

ScamSweeper: Detecting Illegal Accounts in Web3 Scams via Transactions Analysis

Xiaoqi Li, Wenkai Li, Zhijie Liu, Meikang Qiu, Zhiquan Liu, Sen Nie, Zongwei Li, Shi Wu, Yuqing Zhang

TL;DR

This paper introduces ScamSweeper, a temporal-subgraph learning framework designed to detect web3 scams on Ethereum by jointly sampling transaction graphs with temporal structure and learning dynamic subgraph evolution. It introduces Structure Temporal Random Walk (STRWalk) to efficiently extract temporally annotated subgraphs and uses a directed graph encoder plus a transposed Transformer to capture both structural patterns and temporal dynamics. Empirical results on large-scale on-chain datasets show ScamSweeper outperforms state-of-the-art baselines in web3 scam detection and phishing detection, with substantial gains in F1-score and recall. The approach offers scalable, temporally aware detection for on-chain accounts, enabling more effective protection against evolving web3 scams and associated phishing activities.

Abstract

The web3 applications have recently been growing, especially on the Ethereum platform, starting to become the target of scammers. The web3 scams, imitating the services provided by legitimate platforms, mimic regular activity to deceive users. However, previous studies have primarily concentrated on de-anonymization and phishing nodes, neglecting the distinctive features of web3 scams. Moreover, the current phishing account detection tools utilize graph learning or sampling algorithms to obtain graph features. However, large-scale transaction networks with temporal attributes conform to a power-law distribution, posing challenges in detecting web3 scams. To overcome these challenges, we present ScamSweeper, a novel framework that emphasizes the dynamic evolution of transaction graphs, to identify web3 scams on Ethereum. ScamSweeper samples the network with a structure temporal random walk, which is an optimized sample walking method that considers both temporal attributes and structural information. Then, the directed graph encoder generates the features of each subgraph during different temporal intervals, sorting as a sequence. Moreover, a variational Transformer is utilized to extract the dynamic evolution in the subgraph sequence. Furthermore, we collect a large-scale transaction dataset consisting of web3 scams, phishing, and normal accounts, which are from the first 18 million block heights on Ethereum. Subsequently, we comprehensively analyze the distinctions in various attributes, including nodes, edges, and degree distribution. Our experiments indicate that ScamSweeper outperforms SIEGE, Ethident, and PDTGA in detecting web3 scams, achieving a weighted F1-score improvement of at least 17.29% with the base value of 0.59. In addition, ScamSweeper in phishing node detection achieves at least a 17.5% improvement over DGTSG and BERT4ETH in F1-score from 0.80.

ScamSweeper: Detecting Illegal Accounts in Web3 Scams via Transactions Analysis

TL;DR

This paper introduces ScamSweeper, a temporal-subgraph learning framework designed to detect web3 scams on Ethereum by jointly sampling transaction graphs with temporal structure and learning dynamic subgraph evolution. It introduces Structure Temporal Random Walk (STRWalk) to efficiently extract temporally annotated subgraphs and uses a directed graph encoder plus a transposed Transformer to capture both structural patterns and temporal dynamics. Empirical results on large-scale on-chain datasets show ScamSweeper outperforms state-of-the-art baselines in web3 scam detection and phishing detection, with substantial gains in F1-score and recall. The approach offers scalable, temporally aware detection for on-chain accounts, enabling more effective protection against evolving web3 scams and associated phishing activities.

Abstract

The web3 applications have recently been growing, especially on the Ethereum platform, starting to become the target of scammers. The web3 scams, imitating the services provided by legitimate platforms, mimic regular activity to deceive users. However, previous studies have primarily concentrated on de-anonymization and phishing nodes, neglecting the distinctive features of web3 scams. Moreover, the current phishing account detection tools utilize graph learning or sampling algorithms to obtain graph features. However, large-scale transaction networks with temporal attributes conform to a power-law distribution, posing challenges in detecting web3 scams. To overcome these challenges, we present ScamSweeper, a novel framework that emphasizes the dynamic evolution of transaction graphs, to identify web3 scams on Ethereum. ScamSweeper samples the network with a structure temporal random walk, which is an optimized sample walking method that considers both temporal attributes and structural information. Then, the directed graph encoder generates the features of each subgraph during different temporal intervals, sorting as a sequence. Moreover, a variational Transformer is utilized to extract the dynamic evolution in the subgraph sequence. Furthermore, we collect a large-scale transaction dataset consisting of web3 scams, phishing, and normal accounts, which are from the first 18 million block heights on Ethereum. Subsequently, we comprehensively analyze the distinctions in various attributes, including nodes, edges, and degree distribution. Our experiments indicate that ScamSweeper outperforms SIEGE, Ethident, and PDTGA in detecting web3 scams, achieving a weighted F1-score improvement of at least 17.29% with the base value of 0.59. In addition, ScamSweeper in phishing node detection achieves at least a 17.5% improvement over DGTSG and BERT4ETH in F1-score from 0.80.

Paper Structure

This paper contains 40 sections, 11 equations, 13 figures, 5 tables, 1 algorithm.

Figures (13)

  • Figure 1: A Motivating Example of Web3 scams on the Ethereum. The black line represents the activities under the chain, and the blue line indicates the behaviors on the chain.
  • Figure 2: The Main Example of ScamSweeper. It comprises four components that perform the following steps: (a) transactions are gathered from various public sources to construct a graph; (b) the features of subgraphs with temporal intervals are collected in the multi-directed graph; (c) self-attention is utilized to enhance the feature correlation of subgraphs across different temporal intervals; (d) deep neural network makes a classification with embeddings in local receptive fields.
  • Figure 3: An Example of Structure Temporal Random Walk. The green circle represents the first-step sampled node, and the blue circle indicates the second-step sampled nodes.
  • Figure 4: An Example of Directed Graph Encoder in a Single Temporal Interval.
  • Figure 5: A Difference Example of the Transposed and the Traditional Transformer. The upper part of the dotted line is the transposed Transformer, while the lower part represents the traditional Transformer.
  • ...and 8 more figures