Table of Contents
Fetching ...

StableAML: Machine Learning for Behavioral Wallet Detection in Stablecoin Anti-Money Laundering on Ethereum

Luciano Juvinski, Haochen Li, Alessio Brini

TL;DR

This paper tackles AML detection for stablecoin flows on Ethereum by building StableAML, a large, labeled dataset of 16,433 wallets with 68 domain-specific features across four categories. It benchmarks multiple models (logistic regression, tree ensembles, DNN, GraphSAGE GNN) and finds domain-informed tree ensembles consistently outperform graph-based approaches in a sparse, tokenized graph setting, achieving Macro-F1 above $0.97$ and AUROC near 1.0. Beyond high predictive accuracy, the work provides interpretable insights by linking features to money-laundering stages (Placement, Layering, Integration) and aligns with regulatory objectives (MiCA, GENIUS, OFAC) to support auditable, compliant monitoring. The results suggest that deterministic, contract-event–driven signals at stablecoin choke points can robustly detect illicit activity while fostering innovation in a privacy-aware ecosystem.

Abstract

Global illicit fund flows exceed an estimated $3.1 trillion annually, with stablecoins emerging as a preferred laundering medium due to their liquidity. While decentralized protocols increasingly adopt zero-knowledge proofs to obfuscate transaction graphs, centralized stablecoins remain critical "transparent choke points" for compliance. Leveraging this persistent visibility, this study analyzes an Ethereum dataset and uses behavioral features to develop a robust AML framework. Our findings demonstrate that domain-informed tree ensemble models achieve higher Macro-F1 score, significantly outperforming graph neural networks, which struggle with the increasing fragmentation of transaction networks. The model's interpretability goes beyond binary detection, successfully dissecting distinct typologies: it differentiates the complex, high-velocity dispersion of cybercrime syndicates from the constrained, static footprints left by sanctioned entities. This framework aligns with the industry shift toward deterministic verification, satisfying the auditability and compliance expectations under regulations such as the EU's MiCA and the U.S. GENIUS Act while minimizing unjustified asset freezes. By automating high-precision detection, we propose an approach that effectively raises the economic cost of financial misconduct without stifling innovation.

StableAML: Machine Learning for Behavioral Wallet Detection in Stablecoin Anti-Money Laundering on Ethereum

TL;DR

This paper tackles AML detection for stablecoin flows on Ethereum by building StableAML, a large, labeled dataset of 16,433 wallets with 68 domain-specific features across four categories. It benchmarks multiple models (logistic regression, tree ensembles, DNN, GraphSAGE GNN) and finds domain-informed tree ensembles consistently outperform graph-based approaches in a sparse, tokenized graph setting, achieving Macro-F1 above and AUROC near 1.0. Beyond high predictive accuracy, the work provides interpretable insights by linking features to money-laundering stages (Placement, Layering, Integration) and aligns with regulatory objectives (MiCA, GENIUS, OFAC) to support auditable, compliant monitoring. The results suggest that deterministic, contract-event–driven signals at stablecoin choke points can robustly detect illicit activity while fostering innovation in a privacy-aware ecosystem.

Abstract

Global illicit fund flows exceed an estimated $3.1 trillion annually, with stablecoins emerging as a preferred laundering medium due to their liquidity. While decentralized protocols increasingly adopt zero-knowledge proofs to obfuscate transaction graphs, centralized stablecoins remain critical "transparent choke points" for compliance. Leveraging this persistent visibility, this study analyzes an Ethereum dataset and uses behavioral features to develop a robust AML framework. Our findings demonstrate that domain-informed tree ensemble models achieve higher Macro-F1 score, significantly outperforming graph neural networks, which struggle with the increasing fragmentation of transaction networks. The model's interpretability goes beyond binary detection, successfully dissecting distinct typologies: it differentiates the complex, high-velocity dispersion of cybercrime syndicates from the constrained, static footprints left by sanctioned entities. This framework aligns with the industry shift toward deterministic verification, satisfying the auditability and compliance expectations under regulations such as the EU's MiCA and the U.S. GENIUS Act while minimizing unjustified asset freezes. By automating high-precision detection, we propose an approach that effectively raises the economic cost of financial misconduct without stifling innovation.
Paper Structure (25 sections, 12 equations, 6 figures, 10 tables)

This paper contains 25 sections, 12 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Historical transaction volumes for Bitcoin, Ethereum, and combined stablecoins.
  • Figure 2: Sankey diagram of a multi-hop transfer. Funds are withdrawn from Tornado Cash, passed through intermediary wallets to obscure the trail, and rapidly deposited into a centralized exchange (Binance) within a 26-minute window. The prominent grey path highlights the primary volume flow, illustrating the "layering" behavior captured by second-degree network features.
  • Figure 3: Exploratory plots. (a) Distribution of 2ndWithOver10k shows that illicit wallets are structurally embedded in high-volume networks, unlike normal users. (b) Inverse preference for CEX vs. DeFi Swaps.
  • Figure 4: Per-class OvR performance curves for multiclass wallet classification, shown separately for the Normal, Cybercrime, and Blocklisted classes.
  • Figure 5: Consensus ranking of the top 15 features across primary models. The reported "Value" represents the normalized importance derived from the multi-stage pipeline, aggregating: (i) model-specific impurity/gain for tree ensembles, (ii) normalized coefficients ($\beta$) for LR, (iii) model-agnostic permutation importance on the test set, and (iv) SHAP values. This rank-based aggregation ensures that the hierarchy reflects a stable consensus across both linear and non-linear estimators.
  • ...and 1 more figures