Table of Contents
Fetching ...

Enhancing Social Media Post Popularity Prediction with Visual Content

Dahyun Jeong, Hyelim Son, Yunjin Choi, Keunwoo Kim

TL;DR

The paper tackles predicting image-based social media post popularity under a hierarchical data structure by integrating image-derived covariates with traditional non-image features. It leverages Google Cloud Vision API to extract labels and dominant colors, summarizes image content with Seeded-LDA topics, and encodes perceptible colors via the Munsell system. Among evaluated models, tree-based methods (Random Forest and XGBoost) best capture nonlinear interactions and hierarchical structure, with XGBoost achieving the strongest performance and interpretable covariate importance via TreeSHAP—time difference and image label topics (notably Body, Fashion) emerge as key drivers. The study demonstrates practical, interpretable improvements over non-image covariates alone and provides a replicable workflow for image-informed popularity prediction with implications for marketing analytics and platform design.

Abstract

Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.

Enhancing Social Media Post Popularity Prediction with Visual Content

TL;DR

The paper tackles predicting image-based social media post popularity under a hierarchical data structure by integrating image-derived covariates with traditional non-image features. It leverages Google Cloud Vision API to extract labels and dominant colors, summarizes image content with Seeded-LDA topics, and encodes perceptible colors via the Munsell system. Among evaluated models, tree-based methods (Random Forest and XGBoost) best capture nonlinear interactions and hierarchical structure, with XGBoost achieving the strongest performance and interpretable covariate importance via TreeSHAP—time difference and image label topics (notably Body, Fashion) emerge as key drivers. The study demonstrates practical, interpretable improvements over non-image covariates alone and provides a replicable workflow for image-informed popularity prediction with implications for marketing analytics and platform design.

Abstract

Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.
Paper Structure (37 sections, 12 equations, 11 figures, 8 tables)

This paper contains 37 sections, 12 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Google Cloud Vision API output of a sample image. From left to right: the sample image, returned labels/label scores, and dominant colors/representative color
  • Figure 2: Topic variable construction: This figure provides an example of a case where a single post contains three images
  • Figure 3: Panel (a) shows boxplots of the number of "Likes" for all users. Panel (b) is the response variable in (a) scaled by 'Time Difference'. Panel (c) log transforms the response variable in (b)
  • Figure 4: Residuals of randomly chosen 10 posts of 20 users of Model 1 and 2. Vertical gray bars mark a partition between distinct users
  • Figure 5: Actual and predicted "Likes" for a user. The connected black lines display actual "Likes", and the solid lines display predicted "Likes" from LMM (left) and XGB (right), respectively
  • ...and 6 more figures