Identifying Likely-Reputable Blockchain Projects on Ethereum
Cyrus Malik, Josef Bajada, Joshua Ellul
TL;DR
The paper tackles the problem of identifying likely-reputable Ethereum projects by treating reputability as a supervised classification task on transaction-derived features. It builds a labeled dataset from CoinGecko and illicit-activity datasets, extracts 38 features from Etherscan/BigQuery, and employs LightGBM with grid-search and 10-fold cross-validation, achieving an average AUC of $0.999$ and accuracy of $0.9984$. A key finding is the prominence of features like the average time difference between received transactions, which helps distinguish reputable from illicit accounts. The work demonstrates a high-performing, scalable approach that supports safer investment decisions and lays groundwork for future cross-chain analyses, transaction-level reputability, and graph-based feature extraction to further improve discrimination between legitimate and fraudulent projects.
Abstract
Identifying reputable Ethereum projects remains a critical challenge within the expanding blockchain ecosystem. The ability to distinguish between legitimate initiatives and potentially fraudulent schemes is non-trivial. This work presents a systematic approach that integrates multiple data sources with advanced analytics to evaluate credibility, transparency, and overall trustworthiness. The methodology applies machine learning techniques to analyse transaction histories on the Ethereum blockchain. The study classifies accounts based on a dataset comprising 2,179 entities linked to illicit activities and 3,977 associated with reputable projects. Using the LightGBM algorithm, the approach achieves an average accuracy of 0.984 and an average AUC of 0.999, validated through 10-fold cross-validation. Key influential factors include time differences between transactions and received_tnx. The proposed methodology provides a robust mechanism for identifying reputable Ethereum projects, fostering a more secure and transparent investment environment. By equipping stakeholders with data-driven insights, this research enables more informed decision-making, risk mitigation, and the promotion of legitimate blockchain initiatives. Furthermore, it lays the foundation for future advancements in trust assessment methodologies, contributing to the continued development and maturity of the Ethereum ecosystem.
