Table of Contents
Fetching ...

Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum

Phuong Duy Huynh, Son Hoang Dau, Xiaodong Li, Phuc Luong, Emanuele Viterbo

TL;DR

This paper tackles Ponzi detection on Ethereum using transaction data rather than contract code, addressing robustness gaps in opcode-based detectors. It introduces a comprehensive feature set that combines 29 account features with 63 time-series features across 12-hour intervals, compressed into a final 85-feature representation, and demonstrates that temporal dynamics significantly boost detection performance. Using tree-based models, particularly LightGBM, the approach achieves up to a 30% increase in F1-score over prior transaction-based methods and effectively detects previously unseen Ponzi variants, underscoring the value of time-series features. The work highlights the practical importance of robust, data-driven Ponzi detection leveraging lifecycle-based transaction patterns and outlines avenues for larger datasets and more advanced models to further strengthen detection in evolving blockchain ecosystems.

Abstract

The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi developer can fool a detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected. On the contrary, a transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. In this paper, we aim to improve the accuracy of the transaction-based models by employing time-series features, which turn out to be crucial in capturing the life-time behaviour a Ponzi application but were completely overlooked in previous works. We propose a new set of 85 features (22 known account-based and 63 new time-series features), which allows off-the-shelf machine learning algorithms to achieve up to 30% higher F1-scores compared to existing works.

Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum

TL;DR

This paper tackles Ponzi detection on Ethereum using transaction data rather than contract code, addressing robustness gaps in opcode-based detectors. It introduces a comprehensive feature set that combines 29 account features with 63 time-series features across 12-hour intervals, compressed into a final 85-feature representation, and demonstrates that temporal dynamics significantly boost detection performance. Using tree-based models, particularly LightGBM, the approach achieves up to a 30% increase in F1-score over prior transaction-based methods and effectively detects previously unseen Ponzi variants, underscoring the value of time-series features. The work highlights the practical importance of robust, data-driven Ponzi detection leveraging lifecycle-based transaction patterns and outlines avenues for larger datasets and more advanced models to further strengthen detection in evolving blockchain ecosystems.

Abstract

The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi developer can fool a detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected. On the contrary, a transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. In this paper, we aim to improve the accuracy of the transaction-based models by employing time-series features, which turn out to be crucial in capturing the life-time behaviour a Ponzi application but were completely overlooked in previous works. We propose a new set of 85 features (22 known account-based and 63 new time-series features), which allows off-the-shelf machine learning algorithms to achieve up to 30% higher F1-scores compared to existing works.
Paper Structure (27 sections, 4 figures, 5 tables)

This paper contains 27 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Daily transaction volumes of a Ponzi (DynamicPyramid) and a non-Ponzi applications. The Ponzi application had a shorter lifespan with a peak transaction volume concentrating in the first month followed by almost no activities. By contrast, the non-Ponzi application had more regular activities throughout its long lifespan.
  • Figure 2: Investment and payment activities of a Ponzi (DynamicPyramid) and a non-Ponzi applications. Several lower investments (blue dots) were followed by a higher payment (orange dot) in the Ponzi application, which demonstrates the funds accumulation before a payment to an investor can be made.
  • Figure 3: Application balances (in the first four months after launch) of a Ponzi application (DynamicPyramid) and a non-Ponzi application. As observed, the chart of the Ponzi contract had a number of "cliffs" while that of the non-Ponzi contract had none.
  • Figure 4: LGBM's performance when using the most important features (top sub-figure) and the percentages of time-series features among these top features (bottom sub-figure). The F1-score value increases as more time-series features are used in the feature list, demonstrating the effectiveness of using time-series features.