Table of Contents
Fetching ...

Ethereum Fraud Detection via Joint Transaction Language Model and Graph Representation Learning

Jianguo Sun, Yifan Jia, Yanbin Wang, Yiwei Liu, Zhang Sheng, Ye Tian

TL;DR

This work tackles Ethereum fraud detection by addressing three intertwined signals: explicit transaction semantics, cross-transaction similarity, and account-level network structure. It introduces TLMG4Eth, which uses a transaction language model to turn numerical transaction data into semantic sentences, builds a transaction attribute similarity graph, and constructs an account interaction graph, then fuses these signals with a deep multi-head attention network and jointly trains the components for synergistic benefits. Empirical results across three Ethereum datasets show significant improvements (approximately 10–20% in F1 and balanced accuracy) over state-of-the-art baselines, with ablation and parameter studies clarifying the contributions of each component. The approach demonstrates the practical value of combining linguistic representations with graph-based structural learning for robust fraud detection in blockchain networks, and includes a new SPN dataset to reflect up-to-date phishing activity.

Abstract

Ethereum faces growing fraud threats. Current fraud detection methods, whether employing graph neural networks or sequence models, fail to consider the semantic information and similarity patterns within transactions. Moreover, these approaches do not leverage the potential synergistic benefits of combining both types of models. To address these challenges, we propose TLMG4Eth that combines a transaction language model with graph-based methods to capture semantic, similarity, and structural features of transaction data in Ethereum. We first propose a transaction language model that converts numerical transaction data into meaningful transaction sentences, enabling the model to learn explicit transaction semantics. Then, we propose a transaction attribute similarity graph to learn transaction similarity information, enabling us to capture intuitive insights into transaction anomalies. Additionally, we construct an account interaction graph to capture the structural information of the account transaction network. We employ a deep multi-head attention network to fuse transaction semantic and similarity embeddings, and ultimately propose a joint training approach for the multi-head attention network and the account interaction graph to obtain the synergistic benefits of both.

Ethereum Fraud Detection via Joint Transaction Language Model and Graph Representation Learning

TL;DR

This work tackles Ethereum fraud detection by addressing three intertwined signals: explicit transaction semantics, cross-transaction similarity, and account-level network structure. It introduces TLMG4Eth, which uses a transaction language model to turn numerical transaction data into semantic sentences, builds a transaction attribute similarity graph, and constructs an account interaction graph, then fuses these signals with a deep multi-head attention network and jointly trains the components for synergistic benefits. Empirical results across three Ethereum datasets show significant improvements (approximately 10–20% in F1 and balanced accuracy) over state-of-the-art baselines, with ablation and parameter studies clarifying the contributions of each component. The approach demonstrates the practical value of combining linguistic representations with graph-based structural learning for robust fraud detection in blockchain networks, and includes a new SPN dataset to reflect up-to-date phishing activity.

Abstract

Ethereum faces growing fraud threats. Current fraud detection methods, whether employing graph neural networks or sequence models, fail to consider the semantic information and similarity patterns within transactions. Moreover, these approaches do not leverage the potential synergistic benefits of combining both types of models. To address these challenges, we propose TLMG4Eth that combines a transaction language model with graph-based methods to capture semantic, similarity, and structural features of transaction data in Ethereum. We first propose a transaction language model that converts numerical transaction data into meaningful transaction sentences, enabling the model to learn explicit transaction semantics. Then, we propose a transaction attribute similarity graph to learn transaction similarity information, enabling us to capture intuitive insights into transaction anomalies. Additionally, we construct an account interaction graph to capture the structural information of the account transaction network. We employ a deep multi-head attention network to fuse transaction semantic and similarity embeddings, and ultimately propose a joint training approach for the multi-head attention network and the account interaction graph to obtain the synergistic benefits of both.
Paper Structure (23 sections, 16 equations, 4 figures, 5 tables)

This paper contains 23 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The framework of proposed Joint Transaction Language Model and Graph Representation Learning.
  • Figure 2: The generation and combination of ethereum transaction semantic embedding and similarity embedding.
  • Figure 3: Performance of various TASG construction methods under varying threshold $\theta$.
  • Figure 4: Training Loss vs Epoch on MulDiGraph, B4E and SPN datasets with different trade-off parameter $\lambda$.