Table of Contents
Fetching ...

MemeTrans: A Dataset for Detecting High-Risk Memecoin Launches on Solana

Sihao Hu, Selim Furkan Tekin, Yichang Xu, Ling Liu

TL;DR

MemeTrans addresses the surge of high-risk memecoin launches on Solana by providing a large-scale dataset that captures pre-migration launchpad activity and post-migration outcomes. It introduces 122 engineered features across five groups, plus bundle-trace data and a hybrid risk-labeling scheme combining a statistical pilot indicator with a manipulation detector, enabling effective detection of high-risk launches. Empirical results show that models trained on MemeTrans can reduce investment losses by up to around 56% in practical memecoin-selection scenarios, demonstrating actionable value for risk mitigation. The dataset, along with its analysis pipeline, offers a foundation for further research on on-chain coordination, risk scoring, and robust memecoin risk management in rapidly evolving launchpad ecosystems.

Abstract

Launchpads have become the dominant mechanism for issuing memecoins on blockchains due to their fully automated, no-code creation process. This new issuance paradigm has led to a surge in high-risk token launches, causing substantial financial losses for unsuspecting buyers. In this paper, we introduce MemeTrans, the first dataset for studying and detecting high-risk memecoin launches on Solana. MemeTrans covers over 40k memecoin launches that successfully migrated to the public Decentralized Exchange (DEX), with over 30 million transactions during the initial sale on launchpad and 180 million transactions after migration. To precisely capture launch patterns, we design 122 features spanning dimensions such as context, trading activity, holding concentration, and time-series dynamics, supplemented with bundle-level data that reveals multiple accounts controlled by the same entity. Finally, we introduce an annotation approach to label the risk level of memecoin launches, which combines statistical indicators with a manipulation-pattern detector. Experiments on the introduced high-risk launch detection task suggest that designed features are informative for capturing high-risk patterns and ML models trained on MemeTrans can effectively reduce financial loss by 56.1%. Our dataset, experimental code, and pipeline are publicly available at: https://github.com/git-disl/MemeTrans.

MemeTrans: A Dataset for Detecting High-Risk Memecoin Launches on Solana

TL;DR

MemeTrans addresses the surge of high-risk memecoin launches on Solana by providing a large-scale dataset that captures pre-migration launchpad activity and post-migration outcomes. It introduces 122 engineered features across five groups, plus bundle-trace data and a hybrid risk-labeling scheme combining a statistical pilot indicator with a manipulation detector, enabling effective detection of high-risk launches. Empirical results show that models trained on MemeTrans can reduce investment losses by up to around 56% in practical memecoin-selection scenarios, demonstrating actionable value for risk mitigation. The dataset, along with its analysis pipeline, offers a foundation for further research on on-chain coordination, risk scoring, and robust memecoin risk management in rapidly evolving launchpad ecosystems.

Abstract

Launchpads have become the dominant mechanism for issuing memecoins on blockchains due to their fully automated, no-code creation process. This new issuance paradigm has led to a surge in high-risk token launches, causing substantial financial losses for unsuspecting buyers. In this paper, we introduce MemeTrans, the first dataset for studying and detecting high-risk memecoin launches on Solana. MemeTrans covers over 40k memecoin launches that successfully migrated to the public Decentralized Exchange (DEX), with over 30 million transactions during the initial sale on launchpad and 180 million transactions after migration. To precisely capture launch patterns, we design 122 features spanning dimensions such as context, trading activity, holding concentration, and time-series dynamics, supplemented with bundle-level data that reveals multiple accounts controlled by the same entity. Finally, we introduce an annotation approach to label the risk level of memecoin launches, which combines statistical indicators with a manipulation-pattern detector. Experiments on the introduced high-risk launch detection task suggest that designed features are informative for capturing high-risk patterns and ML models trained on MemeTrans can effectively reduce financial loss by 56.1%. Our dataset, experimental code, and pipeline are publicly available at: https://github.com/git-disl/MemeTrans.
Paper Structure (22 sections, 1 equation, 5 figures, 13 tables)

This paper contains 22 sections, 1 equation, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Early buyers accumulate substantial token holdings during the launchpad sale and unwind them into DEX liquidity pools after migration, draining base assets (e.g., SOL) and causing sharp memecoin price declines.
  • Figure 2: After the memecoin is created (Stage 1), the developer bought 97.5 million tokens at an average price of 0.00000794 USD at the earliest bonding-curve tier. During the launchpad sale, price gradually increases as early buyers enter (Stage 2). Once the bonding curve reaches the migration requirement (80% token sold), the launchpad triggers migration (Stage 3) that transfers remaining 20% tokens and collected SOL into a DEX liquidity pool. Following migration, the token becomes tradable on the DEX (Stage 4). As external liquidity deepens, insiders begin to unwind their holdings, causing the price to collapse.
  • Figure 3: Examples of a low-risk memecoin and two manipulated memecoins. For memecoin (b) and (c), manipulators dynamically control the price movement to maximize the profit or to attract normal buyers. For manipulated tokens, we can observe that their price movement patterns are unnatural.
  • Figure 4: High-risk memecoins exhibit shorter launchpad sale durations (a), fewer holders (b), and fewer buy transactions (c), while each buy involves larger token volumes (d). This results from early buyers accumulating a large share of the supply (e, f), highly concentrated holdings, which is more pronounced after bundle identification (g, h).
  • Figure 5: Feature importance score calculated by the RF model. Blue marks the contextual information features, orange marks holding concentration features, green marks market activity features, and red marks the bundle statistic features.