TM-RUGPULL: A Temporary Sound, Multimodal Dataset for Early Detection of RUG Pulls Across the Tokenized Ecosystem
Fatemeh Shoaei, Mohammad Pishdar, Mozafar Bag-Mohammadi, Mojtaba Karami
TL;DR
TM-RugPull is presented, a rigorously curated, leakage-resistant dataset of 1,028 token projects spanning DeFi, meme coins, NFTs, and celebrity-themed tokens that enables causally valid, multimodal analysis of rug-pull dynamics and establishes a new benchmark for reproducible fraud detection research.
Abstract
Rug-pull attacks pose a systemic threat across the blockchain ecosystem, yet research into early detection is hindered by the lack of scientific-grade datasets. Existing resources often suffer from temporal data leakage, narrow modality, and ambiguous labeling, particularly outside DeFi contexts. To address these limitations, we present TM-RugPull, a rigorously curated, leakage-resistant dataset of 1,028 token projects spanning DeFi, meme coins, NFTs, and celebrity-themed tokens. RugPull enforces strict temporal hygiene by extracting all features on chain behavior, smart contract metadata, and OSINT signals strictly from the first half of each project's lifespan. Labels are grounded in forensic reports and longevity criteria, verified through multi-expert consensus. This dataset enables causally valid, multimodal analysis of rug-pull dynamics and establishes a new benchmark for reproducible fraud detection research.
