SAGraph: A Large-Scale Social Graph Dataset with Comprehensive Context for Influencer Selection in Marketing
Xiaoqing Zhang, Yuhan Liu, Jianzhou Wang, Zhenxing Hu, Xiuying Chen, Rui Yan
TL;DR
SAGraph introduces a large-scale, text-rich social-advertising graph constructed from Weibo to study influencer selection in marketing, addressing the gap between simple structural metrics and rich textual interactions. The dataset integrates user profiles, domain-specific interests, product promotions, and multi-round interactions across six product domains, enabling realistic modeling of information diffusion and influence. Experimental results show that context-aware methods and, crucially, large language models with profile and reasoning enhancements significantly outperform traditional baselines in predicting campaign effectiveness, underscoring the value of semantic content in influencer selection. The work provides a public, CC-BY-4.0-licensed resource and demonstrates a path for data-driven advertising strategies, with broad implications for marketing research and practice.
Abstract
Influencer marketing campaign success heavily depends on identifying key opinion leaders who can effectively leverage their credibility and reach to promote products or services. The selecting influencers process is vital for boosting brand visibility, fostering consumer trust, and driving sales. While traditional research often simplifies complex factors like user attitudes, interaction frequency, and advertising content, into simple numerical values. However, this reductionist approach fails to capture the dynamic nature of influencer marketing effectiveness. To bridge this gap, we present SAGraph, a novel comprehensive dataset from Weibo that captures multi-dimensional marketing campaign data across six product domains. The dataset encompasses 345,039 user profiles with their complete interaction histories, including 1.3M comments and 554K reposts across 44K posts, providing unprecedented granularity in influencer marketing dynamics. SAGraph uniquely integrates user profiles, content features, and temporal interaction patterns, enabling in-depth analysis of influencer marketing mechanisms. Experimental results using both traditional baselines and state-of-the-art large language models (LLMs) demonstrate the crucial role of content analysis in predicting advertising effectiveness. Our findings reveal that LLM-based approaches achieve superior performance in understanding and predicting campaign success, opening new avenues for data-driven influencer marketing strategies. We hope that this dataset will inspire further research https://github.com/xiaoqzhwhu/SAGraph/.
