Analyzing public sentiment to gauge key stock events and determine volatility in conjunction with time and options premiums
SriVarsha Mulakala, Umesh Vangapally, Benjamin Larkey, Aidan Henrichs, Corey Wojslaw
TL;DR
The paper proposes a sentiment-enhanced framework for predicting short-term stock movements and earnings-driven volatility by fusing Reddit/Yahoo Finance sentiment with historical prices and volatility data. It leverages transformer-based sentiment analyses (RoBERTa for Reddit, FinBERT for finance news) alongside ensemble models like LightGBM, with SMOTE to address class imbalance, achieving a best-case accuracy of about 70% around event windows. Key contributions include a multi-source data pipeline, a structured feature engineering approach, and an empirical comparison of baselines against transformer-informed signals, along with discussions on data collection challenges and bias mitigation. The approach offers a practical pathway for traders to anticipate earnings-driven moves and option pricing dynamics, while highlighting limitations in data access, alignment, and source bias that guide future improvements such as dynamic data retrieval and source credibility scoring.
Abstract
Analyzing stocks and making higher accurate predictions on where the price is heading continues to become more and more challenging therefore, we designed a new financial algorithm that leverages social media sentiment analysis to enhance the prediction of key stock earnings and associated volatility. Our model integrates sentiment analysis and data retrieval techniques to extract critical information from social media, analyze company financials, and compare sentiments between Wall Street and the general public. This approach aims to provide investors with timely data to execute trades based on key events, rather than relying on long-term stock holding strategies. The stock market is characterized by rapid data flow and fluctuating community sentiments, which can significantly impact trading outcomes. Stock forecasting is complex given its stochastic dynamic. Standard traditional prediction methods often overlook key events and media engagement, focusing its practice into long-term investment options. Our research seeks to change the stochastic dynamic to a more predictable environment by examining the impact of media on stock volatility, understanding and identifying sentiment differences between Wall Street and retail investors, and evaluating the impact of various media networks in predicting earning reports.
