Table of Contents
Fetching ...

Buzz to Broadcast: Predicting Sports Viewership Using Social Media Engagement

Anakin Trotter

TL;DR

Buzz to Broadcast investigates predicting sports viewership using social media engagement. The authors combine Reddit metrics (post counts, comments, scores) and sentiment scores from TextBlob and VADER within a Gradient Boosting Regression framework, with careful preprocessing including log-transforming the target and outlier handling. The model achieves $R^2 = 0.99$, MAE = $1.27$ million viewers, and RMSE = $2.33$ million viewers on the full dataset, with SHAP indicating Total Posts as the strongest predictor. The study demonstrates the potential of pre-event forecasting and targeted advertising based on social media signals, while noting limitations related to data skew toward major events and platform bias; it suggests multi-platform data and advanced sentiment models for broader applicability.

Abstract

Accurately predicting sports viewership is crucial for optimizing ad sales and revenue forecasting. Social media platforms, such as Reddit, provide a wealth of user-generated content that reflects audience engagement and interest. In this study, we propose a regression-based approach to predict sports viewership using social media metrics, including post counts, comments, scores, and sentiment analysis from TextBlob and VADER. Through iterative improvements, such as focusing on major sports subreddits, incorporating categorical features, and handling outliers by sport, the model achieved an $R^2$ of 0.99, a Mean Absolute Error (MAE) of 1.27 million viewers, and a Root Mean Squared Error (RMSE) of 2.33 million viewers on the full dataset. These results demonstrate the model's ability to accurately capture patterns in audience behavior, offering significant potential for pre-event revenue forecasting and targeted advertising strategies.

Buzz to Broadcast: Predicting Sports Viewership Using Social Media Engagement

TL;DR

Buzz to Broadcast investigates predicting sports viewership using social media engagement. The authors combine Reddit metrics (post counts, comments, scores) and sentiment scores from TextBlob and VADER within a Gradient Boosting Regression framework, with careful preprocessing including log-transforming the target and outlier handling. The model achieves , MAE = million viewers, and RMSE = million viewers on the full dataset, with SHAP indicating Total Posts as the strongest predictor. The study demonstrates the potential of pre-event forecasting and targeted advertising based on social media signals, while noting limitations related to data skew toward major events and platform bias; it suggests multi-platform data and advanced sentiment models for broader applicability.

Abstract

Accurately predicting sports viewership is crucial for optimizing ad sales and revenue forecasting. Social media platforms, such as Reddit, provide a wealth of user-generated content that reflects audience engagement and interest. In this study, we propose a regression-based approach to predict sports viewership using social media metrics, including post counts, comments, scores, and sentiment analysis from TextBlob and VADER. Through iterative improvements, such as focusing on major sports subreddits, incorporating categorical features, and handling outliers by sport, the model achieved an of 0.99, a Mean Absolute Error (MAE) of 1.27 million viewers, and a Root Mean Squared Error (RMSE) of 2.33 million viewers on the full dataset. These results demonstrate the model's ability to accurately capture patterns in audience behavior, offering significant potential for pre-event revenue forecasting and targeted advertising strategies.

Paper Structure

This paper contains 19 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Daily posting activity in r/nba, as obtained from Subreddit Stats. The data show stable engagement levels with an outlier spike in June 2019.
  • Figure 2: Feature correlation matrix. The highest correlation observed was 0.62, indicating no strong multicollinearity.
  • Figure 3: Actual vs. Predicted Viewership. The red line represents an ideal fit, while blue points show model predictions. Stratification is visible between Super Bowl and other events.
  • Figure 4: SHAP Feature Importance Plot. The chart indicates that Total Posts is the most critical predictor, followed by Total Comments and Total Scores.