Table of Contents
Fetching ...

SAGraph: A Large-Scale Social Graph Dataset with Comprehensive Context for Influencer Selection in Marketing

Xiaoqing Zhang, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Xiuying Chen, Rui Yan

TL;DR

SAGraph introduces a large-scale, text-rich social-advertising graph constructed from Weibo to study influencer selection in marketing, addressing the gap between simple structural metrics and rich textual interactions. The dataset integrates user profiles, domain-specific interests, product promotions, and multi-round interactions across six product domains, enabling realistic modeling of information diffusion and influence. Experimental results show that context-aware methods and, crucially, large language models with profile and reasoning enhancements significantly outperform traditional baselines in predicting campaign effectiveness, underscoring the value of semantic content in influencer selection. The work provides a public, CC-BY-4.0-licensed resource and demonstrates a path for data-driven advertising strategies, with broad implications for marketing research and practice.

Abstract

Influencer marketing campaign success heavily depends on identifying key opinion leaders who can effectively leverage their credibility and reach to promote products or services. The selecting influencers process is vital for boosting brand visibility, fostering consumer trust, and driving sales. While traditional research often simplifies complex factors like user attitudes, interaction frequency, and advertising content, into simple numerical values. However, this reductionist approach fails to capture the dynamic nature of influencer marketing effectiveness. To bridge this gap, we present SAGraph, a novel comprehensive dataset from Weibo that captures multi-dimensional marketing campaign data across six product domains. The dataset encompasses 345,039 user profiles with their complete interaction histories, including 1.3M comments and 554K reposts across 44K posts, providing unprecedented granularity in influencer marketing dynamics. SAGraph uniquely integrates user profiles, content features, and temporal interaction patterns, enabling in-depth analysis of influencer marketing mechanisms. Experimental results using both traditional baselines and state-of-the-art large language models (LLMs) demonstrate the crucial role of content analysis in predicting advertising effectiveness. Our findings reveal that LLM-based approaches achieve superior performance in understanding and predicting campaign success, opening new avenues for data-driven influencer marketing strategies. We hope that this dataset will inspire further research https://github.com/xiaoqzhwhu/SAGraph/.

SAGraph: A Large-Scale Social Graph Dataset with Comprehensive Context for Influencer Selection in Marketing

TL;DR

SAGraph introduces a large-scale, text-rich social-advertising graph constructed from Weibo to study influencer selection in marketing, addressing the gap between simple structural metrics and rich textual interactions. The dataset integrates user profiles, domain-specific interests, product promotions, and multi-round interactions across six product domains, enabling realistic modeling of information diffusion and influence. Experimental results show that context-aware methods and, crucially, large language models with profile and reasoning enhancements significantly outperform traditional baselines in predicting campaign effectiveness, underscoring the value of semantic content in influencer selection. The work provides a public, CC-BY-4.0-licensed resource and demonstrates a path for data-driven advertising strategies, with broad implications for marketing research and practice.

Abstract

Influencer marketing campaign success heavily depends on identifying key opinion leaders who can effectively leverage their credibility and reach to promote products or services. The selecting influencers process is vital for boosting brand visibility, fostering consumer trust, and driving sales. While traditional research often simplifies complex factors like user attitudes, interaction frequency, and advertising content, into simple numerical values. However, this reductionist approach fails to capture the dynamic nature of influencer marketing effectiveness. To bridge this gap, we present SAGraph, a novel comprehensive dataset from Weibo that captures multi-dimensional marketing campaign data across six product domains. The dataset encompasses 345,039 user profiles with their complete interaction histories, including 1.3M comments and 554K reposts across 44K posts, providing unprecedented granularity in influencer marketing dynamics. SAGraph uniquely integrates user profiles, content features, and temporal interaction patterns, enabling in-depth analysis of influencer marketing mechanisms. Experimental results using both traditional baselines and state-of-the-art large language models (LLMs) demonstrate the crucial role of content analysis in predicting advertising effectiveness. Our findings reveal that LLM-based approaches achieve superior performance in understanding and predicting campaign success, opening new avenues for data-driven influencer marketing strategies. We hope that this dataset will inspire further research https://github.com/xiaoqzhwhu/SAGraph/.
Paper Structure (27 sections, 7 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 7 figures, 6 tables, 1 algorithm.

Figures (7)

  • Figure 1: (a) An illustration of influence factors. (b) The process of influencer selection involves the information diffusion across various influence factors. "$I_1$" and "$I_2$" represent influencers selected due to their larger influence.
  • Figure 2: An example of the SAGraph, which includes users with interest tags, ads as posts, and evolving interaction data.
  • Figure 3: The percentage of user overlap across domains.
  • Figure 4: The comparison of influencers under LLM's simulation: The "comment" action is associated with a purchase likelihood, while the "ignore" action outputs "None".
  • Figure 5: The interest distribution for the product 'Spark Thinking' at the start and the final stage. The proportions of 'parenting' and 'life sharing' increase, consistent with the product's educational nature.
  • ...and 2 more figures