AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation

Sandeep Neela

AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation

Sandeep Neela

TL;DR

AIMM addresses the rise of social-media-driven market manipulation by fusing Reddit-derived signals with OHLCV market data into a unified AMRS score. The framework extends the Stock-Pattern-Assistant by incorporating social volume, sentiment, bot-likeness, coordination, and market anomalies, and it uses a parquet-based pipeline plus a Streamlit dashboard for exploratory analysis. A key contribution is the AIMM-GT ground-truth dataset, along with forward-walk evaluation and prospective prediction logging to emulate real-time deployment. Early results on a small but carefully constructed dataset show strong ranking discrimination (ROC-AUC ~0.99) and the potential for multi-signal early warnings, exemplified by detecting the GME event days in advance. Limitations include the small sample size, reliance on synthetic social features due to Reddit data restrictions, and the need for broader validation before production deployment.

Abstract

Market manipulation now routinely originates from coordinated social media campaigns, not isolated trades. Retail investors, regulators, and brokerages need tools that connect online narratives and coordination patterns to market behavior. We present AIMM, an AI-driven framework that fuses Reddit activity, bot and coordination indicators, and OHLCV market features into a daily AIMM Manipulation Risk Score for each ticker. The system uses a parquet-native pipeline with a Streamlit dashboard that allows analysts to explore suspicious windows, inspect underlying posts and price action, and log model outputs over time. Due to Reddit API restrictions, we employ calibrated synthetic social features matching documented event characteristics; market data (OHLCV) uses real historical data from Yahoo Finance. This release makes three contributions. First, we build the AIMM Ground Truth dataset (AIMM-GT): 33 labeled ticker-days spanning eight equities, drawing from SEC enforcement actions, community-verified manipulation cases, and matched normal controls. Second, we implement forward-walk evaluation and prospective prediction logging for both retrospective and deployment-style assessment. Third, we analyze lead times and show that AIMM flagged GME 22 days before the January 2021 squeeze peak. The current labeled set is small (33 ticker-days, 3 positive events), but results show preliminary discriminative capability and early warnings for the GME incident. We release the code, dataset schema, and dashboard design to support research on social media-driven market surveillance.

AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation

TL;DR

Abstract

AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)