Table of Contents
Fetching ...

Explainable Ponzi Schemes Detection on Ethereum

Letterio Galletta, Fabio Pinelli

TL;DR

This paper tackles the problem of detecting smart Ponzi contracts on Ethereum by releasing a public dataset of 4,422 contracts labeled via Ponzi criteria and by training ML classifiers to distinguish Ponzi from non-Ponzi contracts. It introduces a rich feature set, including nine novel features, and shows that a LightGBM model trained on these features outperforms previous approaches using AUC, with statistical significance validated by a McNemar test. The authors also apply SHAP-based explainability to identify the most influential features and interactions, providing interpretable insights into what drives Ponzi classification. The work offers practical value by enabling reproducible research and tools for fraud detection, and outlines future work on bytecode analytics, deeper learning, and broader scam detection on Ethereum.

Abstract

Blockchain technology has been successfully exploited for deploying new economic applications. However, it has started arousing the interest of malicious actors who deliver scams to deceive honest users and to gain economic advantages. Ponzi schemes are one of the most common scams. Here, we present a classifier for detecting smart Ponzi contracts on Ethereum, which can be used as the backbone for developing detection tools. First, we release a labelled data set with 4422 unique real-world smart contracts to address the problem of the unavailability of labelled data. Then, we show that our classifier outperforms the ones proposed in the literature when considering the AUC as a metric. Finally, we identify a small and effective set of features that ensures a good classification quality and investigate their impacts on the classification using eXplainable AI techniques.

Explainable Ponzi Schemes Detection on Ethereum

TL;DR

This paper tackles the problem of detecting smart Ponzi contracts on Ethereum by releasing a public dataset of 4,422 contracts labeled via Ponzi criteria and by training ML classifiers to distinguish Ponzi from non-Ponzi contracts. It introduces a rich feature set, including nine novel features, and shows that a LightGBM model trained on these features outperforms previous approaches using AUC, with statistical significance validated by a McNemar test. The authors also apply SHAP-based explainability to identify the most influential features and interactions, providing interpretable insights into what drives Ponzi classification. The work offers practical value by enabling reproducible research and tools for fraud detection, and outlines future work on bytecode analytics, deeper learning, and broader scam detection on Ethereum.

Abstract

Blockchain technology has been successfully exploited for deploying new economic applications. However, it has started arousing the interest of malicious actors who deliver scams to deceive honest users and to gain economic advantages. Ponzi schemes are one of the most common scams. Here, we present a classifier for detecting smart Ponzi contracts on Ethereum, which can be used as the backbone for developing detection tools. First, we release a labelled data set with 4422 unique real-world smart contracts to address the problem of the unavailability of labelled data. Then, we show that our classifier outperforms the ones proposed in the literature when considering the AUC as a metric. Finally, we identify a small and effective set of features that ensures a good classification quality and investigate their impacts on the classification using eXplainable AI techniques.
Paper Structure (8 sections, 8 figures, 2 tables)

This paper contains 8 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: The cumulative distributions of some continuous features: the distributions of the smart Ponzi are in blue, in orange for the not Ponzi ones.
  • Figure 2: The percentage of Ponzi and not Ponzi smart contracts for Features 26-28.
  • Figure 3: The ROC curve on the test set for the three best classifiers, one for each dataset.
  • Figure 4: Confusion matrices of the best classifier on our data sets, where we indicate with N and P the not Ponzi and Ponzi class respectively.
  • Figure 5: The importance of the new set of features included in dataset D1.
  • ...and 3 more figures