Clean Up the Mess: Addressing Data Pollution in Cryptocurrency Abuse Reporting Services
Gibran Gomez, Kevin van Liebergen, Davide Sanvito, Giuseppe Siracusano, Roberto Gonzalez, Juan Caballero
TL;DR
This paper investigates data pollution in cryptocurrency abuse reporting services, revealing that crowd-sourced reports can be flooded by spam and mislabeling, which distorts understanding of abuse prevalence and financial impact. It advances a novel unsupervised LLM-based classifier that leverages explicit abuse-type definitions to reliably classify reports and filter spam, outperforming baselines and generalizing to out-of-distribution data. By constructing a large ground-truth dataset and applying dual revenue-estimation approaches (victim-reported losses vs. deposits-based revenue), the authors demonstrate that victim-reported losses substantially underestimate criminal revenue—by up to 29×—with investment scams driving the largest revenue and extortion campaigns contributing high reach but lower conversion. The work provides a practical defense against data pollution, generalizes to multiple blockchains and reporting services, and offers a publicly available dataset that can fuel future research in threat intelligence and abuse-report evaluation.
Abstract
Cryptocurrency abuse reporting services are a valuable data source about abusive blockchain addresses, prevalent types of cryptocurrency abuse, and their financial impact on victims. However, they may suffer data pollution due to their crowd-sourced nature. This work analyzes the extent and impact of data pollution in cryptocurrency abuse reporting services and proposes a novel LLM-based defense to address the pollution. We collect 289K abuse reports submitted over 6 years to two popular services and use them to answer three research questions. RQ1 analyzes the extent and impact of pollution. We show that spam reports will eventually flood unchecked abuse reporting services, with BitcoinAbuse receiving 75% of spam before stopping operations. We build a public dataset of 19,443 abuse reports labeled with 19 popular abuse types and use it to reveal the inaccuracy of user-reported abuse types. We identified 91 (0.1%) benign addresses reported, responsible for 60% of all the received funds. RQ2 examines whether we can automate identifying valid reports and their classification into abuse types. We propose an unsupervised LLM-based classifier that achieves an F1 score of 0.95 when classifying reports, an F1 of 0.89 when classifying out-of-distribution data, and an F1 of 0.99 when identifying spam reports. Our unsupervised LLM-based classifier clearly outperforms two baselines: a supervised classifier and a naive usage of the LLM. Finally, RQ3 demonstrates the usefulness of our LLM-based classifier for quantifying the financial impact of different cryptocurrency abuse types. We show that victim-reported losses heavily underestimate cybercriminal revenue by estimating a 29 times higher revenue from deposit transactions. We identified that investment scams have the highest financial impact and that extortions have lower conversion rates but compensate for them with massive email campaigns.
