Analyzing Sustainability Messaging in Large-Scale Corporate Social Media
Ujjwal Sharma, Stevan Rudinac, Ana Mićković, Willemijn van Dolen, Marcel Worring
TL;DR
This work tackles the challenge of analyzing sustainability messaging at scale on corporate social media by introducing a two-stage multimodal pipeline. The textual component uses an ensemble of large language models to map tweets to the 17 Sustainable Development Goals (SDGs), with a majority-vote scheme and a tie-breaker to stabilize predictions; hashtags serve as proxy ground truth for evaluation. The visual component employs vision-language embeddings and graph-based clustering to uncover semantically coherent visual themes linked to ESG risk and audience engagement, followed by generative VL summarization of representative samples. The study leverages a large Fortune 1000 Twitter corpus, ESG risk data, and sector classifications to reveal sector-specific SDG emphasis, SDG–ESG risk correlations, and visually driven patterns in sustainability communication. Overall, the framework offers a scalable, adaptable approach to content-centric sustainability analytics with potential applicability to other domains.
Abstract
In this work, we introduce a multimodal analysis pipeline that leverages large foundation models in vision and language to analyze corporate social media content, with a focus on sustainability-related communication. Addressing the challenges of evolving, multimodal, and often ambiguous corporate messaging on platforms such as X (formerly Twitter), we employ an ensemble of large language models (LLMs) to annotate a large corpus of corporate tweets on their topical alignment with the 17 Sustainable Development Goals (SDGs). This approach avoids the need for costly, task-specific annotations and explores the potential of such models as ad-hoc annotators for social media data that can efficiently capture both explicit and implicit references to sustainability themes in a scalable manner. Complementing this textual analysis, we utilize vision-language models (VLMs), within a visual understanding framework that uses semantic clusters to uncover patterns in visual sustainability communication. This integrated approach reveals sectoral differences in SDG engagement, temporal trends, and associations between corporate messaging, environmental, social, governance (ESG) risks, and consumer engagement. Our methods-automatic label generation and semantic visual clustering-are broadly applicable to other domains and offer a flexible framework for large-scale social media analysis.
