MemeTrans: A Dataset for Detecting High-Risk Memecoin Launches on Solana
Sihao Hu, Selim Furkan Tekin, Yichang Xu, Ling Liu
TL;DR
MemeTrans addresses the surge of high-risk memecoin launches on Solana by providing a large-scale dataset that captures pre-migration launchpad activity and post-migration outcomes. It introduces 122 engineered features across five groups, plus bundle-trace data and a hybrid risk-labeling scheme combining a statistical pilot indicator with a manipulation detector, enabling effective detection of high-risk launches. Empirical results show that models trained on MemeTrans can reduce investment losses by up to around 56% in practical memecoin-selection scenarios, demonstrating actionable value for risk mitigation. The dataset, along with its analysis pipeline, offers a foundation for further research on on-chain coordination, risk scoring, and robust memecoin risk management in rapidly evolving launchpad ecosystems.
Abstract
Launchpads have become the dominant mechanism for issuing memecoins on blockchains due to their fully automated, no-code creation process. This new issuance paradigm has led to a surge in high-risk token launches, causing substantial financial losses for unsuspecting buyers. In this paper, we introduce MemeTrans, the first dataset for studying and detecting high-risk memecoin launches on Solana. MemeTrans covers over 40k memecoin launches that successfully migrated to the public Decentralized Exchange (DEX), with over 30 million transactions during the initial sale on launchpad and 180 million transactions after migration. To precisely capture launch patterns, we design 122 features spanning dimensions such as context, trading activity, holding concentration, and time-series dynamics, supplemented with bundle-level data that reveals multiple accounts controlled by the same entity. Finally, we introduce an annotation approach to label the risk level of memecoin launches, which combines statistical indicators with a manipulation-pattern detector. Experiments on the introduced high-risk launch detection task suggest that designed features are informative for capturing high-risk patterns and ML models trained on MemeTrans can effectively reduce financial loss by 56.1%. Our dataset, experimental code, and pipeline are publicly available at: https://github.com/git-disl/MemeTrans.
