Table of Contents
Fetching ...

SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments

Dachun Sun, You Lyu, Jinning Li, Yizhuo Chen, Tianshi Wang, Tomoyoshi Kimura, Tarek Abdelzaher

TL;DR

SCRAG addresses forecasting community responses to social media posts in dynamic environments by grounding LLM-based generation with two retrieval branches over historical discourse ($D_p$) and time-sensitive external knowledge ($D_n$). It combines social computing–based Retrieval-Augmented Generation with a community-aware historical response retriever and a sparse external knowledge retriever, producing realistic, ideologically diverse forecasts. Across six real-world X platform scenarios, SCRAG achieves consistent improvements in emotion distribution alignment and cluster diversity compared with baselines, demonstrating strong grounding and versatility. The framework is modular and adaptable to different embedding models and LLMs, with potential for multimodal extensions to further enhance predictive accuracy.

Abstract

This paper introduces SCRAG, a prediction framework inspired by social computing, designed to forecast community responses to real or hypothetical social media posts. SCRAG can be used by public relations specialists (e.g., to craft messaging in ways that avoid unintended misinterpretations) or public figures and influencers (e.g., to anticipate social responses), among other applications related to public sentiment prediction, crisis management, and social what-if analysis. While large language models (LLMs) have achieved remarkable success in generating coherent and contextually rich text, their reliance on static training data and susceptibility to hallucinations limit their effectiveness at response forecasting in dynamic social media environments. SCRAG overcomes these challenges by integrating LLMs with a Retrieval-Augmented Generation (RAG) technique rooted in social computing. Specifically, our framework retrieves (i) historical responses from the target community to capture their ideological, semantic, and emotional makeup, and (ii) external knowledge from sources such as news articles to inject time-sensitive context. This information is then jointly used to forecast the responses of the target community to new posts or narratives. Extensive experiments across six scenarios on the X platform (formerly Twitter), tested with various embedding models and LLMs, demonstrate over 10% improvements on average in key evaluation metrics. A concrete example further shows its effectiveness in capturing diverse ideologies and nuances. Our work provides a social computing tool for applications where accurate and concrete insights into community responses are crucial.

SCRAG: Social Computing-Based Retrieval Augmented Generation for Community Response Forecasting in Social Media Environments

TL;DR

SCRAG addresses forecasting community responses to social media posts in dynamic environments by grounding LLM-based generation with two retrieval branches over historical discourse () and time-sensitive external knowledge (). It combines social computing–based Retrieval-Augmented Generation with a community-aware historical response retriever and a sparse external knowledge retriever, producing realistic, ideologically diverse forecasts. Across six real-world X platform scenarios, SCRAG achieves consistent improvements in emotion distribution alignment and cluster diversity compared with baselines, demonstrating strong grounding and versatility. The framework is modular and adaptable to different embedding models and LLMs, with potential for multimodal extensions to further enhance predictive accuracy.

Abstract

This paper introduces SCRAG, a prediction framework inspired by social computing, designed to forecast community responses to real or hypothetical social media posts. SCRAG can be used by public relations specialists (e.g., to craft messaging in ways that avoid unintended misinterpretations) or public figures and influencers (e.g., to anticipate social responses), among other applications related to public sentiment prediction, crisis management, and social what-if analysis. While large language models (LLMs) have achieved remarkable success in generating coherent and contextually rich text, their reliance on static training data and susceptibility to hallucinations limit their effectiveness at response forecasting in dynamic social media environments. SCRAG overcomes these challenges by integrating LLMs with a Retrieval-Augmented Generation (RAG) technique rooted in social computing. Specifically, our framework retrieves (i) historical responses from the target community to capture their ideological, semantic, and emotional makeup, and (ii) external knowledge from sources such as news articles to inject time-sensitive context. This information is then jointly used to forecast the responses of the target community to new posts or narratives. Extensive experiments across six scenarios on the X platform (formerly Twitter), tested with various embedding models and LLMs, demonstrate over 10% improvements on average in key evaluation metrics. A concrete example further shows its effectiveness in capturing diverse ideologies and nuances. Our work provides a social computing tool for applications where accurate and concrete insights into community responses are crucial.

Paper Structure

This paper contains 22 sections, 4 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Framework architecture of SCRAG.
  • Figure 2: Special instruction that focuses the embedding model on the response, and LLM prompt used by SCRAG that has historical responses with ideologies and external knowledge.