Table of Contents
Fetching ...

Reddit's Appetite: Predicting User Engagement with Nutritional Content

Gabriela Ozegovic, Thorsten Ruprechter, Denis Helic

TL;DR

The study investigates whether nutritional content inferred from post titles predicts user engagement with food content on Reddit's r/Food. It introduces an embedding-based nutrition estimation method that maps titles to macronutrient densities using USDA data, applied to nearly 600k posts and yielding over 300k nutrition-estimated entries. Using XGBoost and SHAP explanations, the authors show that higher calorie density and related nutrients correlate with more comments and resonance, improving ROC-AUC by about 0.01 for engagement and 0.04 for resonance. The work advances scalable, explainable analysis of nutrition in online communities, with implications for designing engaging, health-promoting content, while acknowledging limitations such as reliance on textual descriptions and observational design.

Abstract

The increased popularity of food communities on social media shapes the way people engage with food-related content. Due to the extensive consequences of such content on users' eating behavior, researchers have started studying the factors that drive user engagement with food in online platforms. However, while most studies focus on visual aspects of food content in social media, there exist only initial studies exploring the impact of nutritional content on user engagement. In this paper, we set out to close this gap and analyze food-related posts on Reddit, focusing on the association between the nutritional density of a meal and engagement levels, particularly the number of comments. Hence, we collect and empirically analyze almost 600,000 food-related posts and uncover differences in nutritional content between engaging and non-engaging posts. Moreover, we train a series of XGBoost models, and evaluate the importance of nutritional density while predicting whether users will comment on a post or whether a post will substantially resonate with the community. We find that nutritional features improve the baseline model's accuracy by 4%, with a positive contribution of calorie density towards prediction of engagement, suggesting that higher nutritional content is associated with higher user engagement in food-related posts. Our results provide valuable insights for the design of more engaging online initiatives aimed at, for example, encouraging healthy eating habits.

Reddit's Appetite: Predicting User Engagement with Nutritional Content

TL;DR

The study investigates whether nutritional content inferred from post titles predicts user engagement with food content on Reddit's r/Food. It introduces an embedding-based nutrition estimation method that maps titles to macronutrient densities using USDA data, applied to nearly 600k posts and yielding over 300k nutrition-estimated entries. Using XGBoost and SHAP explanations, the authors show that higher calorie density and related nutrients correlate with more comments and resonance, improving ROC-AUC by about 0.01 for engagement and 0.04 for resonance. The work advances scalable, explainable analysis of nutrition in online communities, with implications for designing engaging, health-promoting content, while acknowledging limitations such as reliance on textual descriptions and observational design.

Abstract

The increased popularity of food communities on social media shapes the way people engage with food-related content. Due to the extensive consequences of such content on users' eating behavior, researchers have started studying the factors that drive user engagement with food in online platforms. However, while most studies focus on visual aspects of food content in social media, there exist only initial studies exploring the impact of nutritional content on user engagement. In this paper, we set out to close this gap and analyze food-related posts on Reddit, focusing on the association between the nutritional density of a meal and engagement levels, particularly the number of comments. Hence, we collect and empirically analyze almost 600,000 food-related posts and uncover differences in nutritional content between engaging and non-engaging posts. Moreover, we train a series of XGBoost models, and evaluate the importance of nutritional density while predicting whether users will comment on a post or whether a post will substantially resonate with the community. We find that nutritional features improve the baseline model's accuracy by 4%, with a positive contribution of calorie density towards prediction of engagement, suggesting that higher nutritional content is associated with higher user engagement in food-related posts. Our results provide valuable insights for the design of more engaging online initiatives aimed at, for example, encouraging healthy eating habits.

Paper Structure

This paper contains 12 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Posts and comments in r/Food over time. We present how postings and comments developed from $2017$ until $2023$ across different temporal scales including yearly, monthly, weekly, and daily trends. In (\ref{['fig:posts_per_year']}) we present the number of posts over the years. We observe a positive trend before the COVID-19 pandemic with a noticeable peak during the pandemic and a drop afterwards to pre-pandemic levels. Monthly posting activity in (\ref{['fig:posts_per_month']}) is rather consistent except for a peak between March and June $2020$ during the pandemic. In (\ref{['fig:posts_weekend']}) we observe that more posts are created on weekdays than on weekends (left) and that most posts are created in the afternoon in the eastern USA (Q4, right). The bottom row shows the same diagrams for comments. In (\ref{['fig:comm_per_year']}) we observe a gradual increase in commenting activity over time with the highest activity levels during the pandemic and a sharp drop after the pandemic. This observation is also reflected in (\ref{['fig:comm_per_month']}), where we see constant high levels of comments in $2020$. We also see a seasonal spike in January possibly due to the holiday season. In (\ref{['fig:comm_weekend']}), comments mirror posting activity, with more comments over the weekdays (left). On the other hand, the peak in comments is in the morning (Q3, right).
  • Figure 2: Nutritional content distribution of food in r/Food posts. We illustrate the distribution of calories (\ref{['fig:rq1_calorie']}, \ref{['fig:rq2_calorie']}) and macro-nutrients (\ref{['fig:rq1_protein']}--\ref{['fig:rq1_fat']}, \ref{['fig:rq2_protein']}--\ref{['fig:rq2_fat']}) per $100$g of food, across meal in (i) engaging (red) and non-engaging (blue) posts, and (ii) resonant (red) and non-resonant (blue) posts. The calorie content is measured in kCal per $100$g, while macro-nutrients are measured in grams as fractions of $100$g total. We observe that the majority of posts fall within the moderate calorie range, between $100$ and $300$ kCal. Top row: Calorie densities of posts with comments and without comments appear similar but differ significantly in means (\ref{['fig:rq1_calorie']}). We observe a steep decline in the protein (\ref{['fig:rq1_protein']}) density, with most posts having less than $20$g of protein, suggesting a prevalence of low-to-moderate protein meals. Carbohydrates (\ref{['fig:rq1_carb']}) span over a wider range. While most posts have less than $20$g, there is a consistent amount of carb-rich food as well, as indicated by the long tail in their distributions. Fat (\ref{['fig:rq1_fat']}) distribution peaks around $10$-$15$g, with most posts containing moderate fat content. Bottom row: Distribution disparities are more prominent when comparing resonant vs. non-resonant posts. Posts that do not resonate with the community peak at around $150$ kCal, while posts that do resonate peak at $300$ kCal (\ref{['fig:rq2_calorie']}). We observe similar behavior in all other macronutrient densities, with distributions for resonant posts being shifted to the right as compared to non-resonant posts. (\ref{['fig:rq2_protein']}--\ref{['fig:rq2_fat']}).
  • Figure 3: Engagement discriminators from post titles. We present discriminative words used significantly differently in engaging and non-engaging posts as word clouds. The red color indicates words more frequently used in posts with (\ref{['fig:wc1']}) comments, or in resonant (\ref{['fig:wc2']}) posts. Blue color represents discriminative words more frequently used in posts without (\ref{['fig:wc1']}) comments, or in non-resonant (\ref{['fig:wc2']}) posts. The size of each word reflects its frequency within the respective group.
  • Figure 4: SHAP visualizations for classifier predicting post engagement. Looking at SHAP values of different features, we can understand to which degree they influence the probability of a post receiving engagement. In the beeswarm plot (\ref{['fig:shap_rq1']}) we display how the top features impact the model's output, with each dot representing one post. Posting after COVID-19, being an experienced user, and having higher calorie meals strongly increases the likelihood of engagement prediction. Additionally, the absence of no-engagement discriminators, posting on the weekday and later in the day further increases those odds. Foods with either high or low protein content have a higher probability of engagement than foods with moderate protein content. In the bar plot (\ref{['fig:shap_bar_rq1']}) we present the feature importance in absolute values. The calorie density ranks third after controlling for COVID-19 and the user tenure. We also present two examples (\ref{['fig:shap_rq1_pred_0']}-\ref{['fig:shap_rq1_pred_1']}) with local feature importance, highlighting the concrete values of each feature and the way they contributed to the prediction of a post receving comments.
  • Figure 5: SHAP visualizations for classifier predicting post's resonance level. SHAP values of features provide us with the transparency of a classifier and allow us to understand which features are beneficial to achieve resonance. In the beeswarm plot (\ref{['fig:shap_rq2']}) we present how the values of features impact the prediction, while the bar plot (\ref{['fig:shap_bar_rq2']}) presents each top feature's importance. Calorie density is the fourth most important feature after several control features. Its importance is more than two times higher than for the engagement predictor. Similarly, a higher carbohydrate value indicates the prediction of resonance, and a higher protein content, although smaller, still has a positive influence. Fat content has less of a prediction power. Additionally, we present two concrete examples (\ref{['fig:shap_rq2_pred_0']}-\ref{['fig:shap_rq2_pred_1']}), with local feature importance and values that drive the prediction of a post resonating with the community.