Buzz to Broadcast: Predicting Sports Viewership Using Social Media Engagement
Anakin Trotter
TL;DR
Buzz to Broadcast investigates predicting sports viewership using social media engagement. The authors combine Reddit metrics (post counts, comments, scores) and sentiment scores from TextBlob and VADER within a Gradient Boosting Regression framework, with careful preprocessing including log-transforming the target and outlier handling. The model achieves $R^2 = 0.99$, MAE = $1.27$ million viewers, and RMSE = $2.33$ million viewers on the full dataset, with SHAP indicating Total Posts as the strongest predictor. The study demonstrates the potential of pre-event forecasting and targeted advertising based on social media signals, while noting limitations related to data skew toward major events and platform bias; it suggests multi-platform data and advanced sentiment models for broader applicability.
Abstract
Accurately predicting sports viewership is crucial for optimizing ad sales and revenue forecasting. Social media platforms, such as Reddit, provide a wealth of user-generated content that reflects audience engagement and interest. In this study, we propose a regression-based approach to predict sports viewership using social media metrics, including post counts, comments, scores, and sentiment analysis from TextBlob and VADER. Through iterative improvements, such as focusing on major sports subreddits, incorporating categorical features, and handling outliers by sport, the model achieved an $R^2$ of 0.99, a Mean Absolute Error (MAE) of 1.27 million viewers, and a Root Mean Squared Error (RMSE) of 2.33 million viewers on the full dataset. These results demonstrate the model's ability to accurately capture patterns in audience behavior, offering significant potential for pre-event revenue forecasting and targeted advertising strategies.
