SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments
Dachun Sun, You Lyu, Jinning Li, Yizhuo Chen, Tianshi Wang, Tomoyoshi Kimura, Tarek Abdelzaher
TL;DR
SCRAG addresses forecasting community responses to social media posts in dynamic environments by grounding LLM-based generation with two retrieval branches over historical discourse ($D_p$) and time-sensitive external knowledge ($D_n$). It combines social computing–based Retrieval-Augmented Generation with a community-aware historical response retriever and a sparse external knowledge retriever, producing realistic, ideologically diverse forecasts. Across six real-world X platform scenarios, SCRAG achieves consistent improvements in emotion distribution alignment and cluster diversity compared with baselines, demonstrating strong grounding and versatility. The framework is modular and adaptable to different embedding models and LLMs, with potential for multimodal extensions to further enhance predictive accuracy.
Abstract
This paper introduces SCRAG, a prediction framework inspired by social computing, designed to forecast community responses to real or hypothetical social media posts. SCRAG can be used by public relations specialists (e.g., to craft messaging in ways that avoid unintended misinterpretations) or public figures and influencers (e.g., to anticipate social responses), among other applications related to public sentiment prediction, crisis management, and social what-if analysis. While large language models (LLMs) have achieved remarkable success in generating coherent and contextually rich text, their reliance on static training data and susceptibility to hallucinations limit their effectiveness at response forecasting in dynamic social media environments. SCRAG overcomes these challenges by integrating LLMs with a Retrieval-Augmented Generation (RAG) technique rooted in social computing. Specifically, our framework retrieves (i) historical responses from the target community to capture their ideological, semantic, and emotional makeup, and (ii) external knowledge from sources such as news articles to inject time-sensitive context. This information is then jointly used to forecast the responses of the target community to new posts or narratives. Extensive experiments across six scenarios on the X platform (formerly Twitter), tested with various embedding models and LLMs, demonstrate over 10% improvements on average in key evaluation metrics. A concrete example further shows its effectiveness in capturing diverse ideologies and nuances. Our work provides a social computing tool for applications where accurate and concrete insights into community responses are crucial.
