Table of Contents
Fetching ...

Analyzing Sustainability Messaging in Large-Scale Corporate Social Media

Ujjwal Sharma, Stevan Rudinac, Ana Mićković, Willemijn van Dolen, Marcel Worring

TL;DR

This work tackles the challenge of analyzing sustainability messaging at scale on corporate social media by introducing a two-stage multimodal pipeline. The textual component uses an ensemble of large language models to map tweets to the 17 Sustainable Development Goals (SDGs), with a majority-vote scheme and a tie-breaker to stabilize predictions; hashtags serve as proxy ground truth for evaluation. The visual component employs vision-language embeddings and graph-based clustering to uncover semantically coherent visual themes linked to ESG risk and audience engagement, followed by generative VL summarization of representative samples. The study leverages a large Fortune 1000 Twitter corpus, ESG risk data, and sector classifications to reveal sector-specific SDG emphasis, SDG–ESG risk correlations, and visually driven patterns in sustainability communication. Overall, the framework offers a scalable, adaptable approach to content-centric sustainability analytics with potential applicability to other domains.

Abstract

In this work, we introduce a multimodal analysis pipeline that leverages large foundation models in vision and language to analyze corporate social media content, with a focus on sustainability-related communication. Addressing the challenges of evolving, multimodal, and often ambiguous corporate messaging on platforms such as X (formerly Twitter), we employ an ensemble of large language models (LLMs) to annotate a large corpus of corporate tweets on their topical alignment with the 17 Sustainable Development Goals (SDGs). This approach avoids the need for costly, task-specific annotations and explores the potential of such models as ad-hoc annotators for social media data that can efficiently capture both explicit and implicit references to sustainability themes in a scalable manner. Complementing this textual analysis, we utilize vision-language models (VLMs), within a visual understanding framework that uses semantic clusters to uncover patterns in visual sustainability communication. This integrated approach reveals sectoral differences in SDG engagement, temporal trends, and associations between corporate messaging, environmental, social, governance (ESG) risks, and consumer engagement. Our methods-automatic label generation and semantic visual clustering-are broadly applicable to other domains and offer a flexible framework for large-scale social media analysis.

Analyzing Sustainability Messaging in Large-Scale Corporate Social Media

TL;DR

This work tackles the challenge of analyzing sustainability messaging at scale on corporate social media by introducing a two-stage multimodal pipeline. The textual component uses an ensemble of large language models to map tweets to the 17 Sustainable Development Goals (SDGs), with a majority-vote scheme and a tie-breaker to stabilize predictions; hashtags serve as proxy ground truth for evaluation. The visual component employs vision-language embeddings and graph-based clustering to uncover semantically coherent visual themes linked to ESG risk and audience engagement, followed by generative VL summarization of representative samples. The study leverages a large Fortune 1000 Twitter corpus, ESG risk data, and sector classifications to reveal sector-specific SDG emphasis, SDG–ESG risk correlations, and visually driven patterns in sustainability communication. Overall, the framework offers a scalable, adaptable approach to content-centric sustainability analytics with potential applicability to other domains.

Abstract

In this work, we introduce a multimodal analysis pipeline that leverages large foundation models in vision and language to analyze corporate social media content, with a focus on sustainability-related communication. Addressing the challenges of evolving, multimodal, and often ambiguous corporate messaging on platforms such as X (formerly Twitter), we employ an ensemble of large language models (LLMs) to annotate a large corpus of corporate tweets on their topical alignment with the 17 Sustainable Development Goals (SDGs). This approach avoids the need for costly, task-specific annotations and explores the potential of such models as ad-hoc annotators for social media data that can efficiently capture both explicit and implicit references to sustainability themes in a scalable manner. Complementing this textual analysis, we utilize vision-language models (VLMs), within a visual understanding framework that uses semantic clusters to uncover patterns in visual sustainability communication. This integrated approach reveals sectoral differences in SDG engagement, temporal trends, and associations between corporate messaging, environmental, social, governance (ESG) risks, and consumer engagement. Our methods-automatic label generation and semantic visual clustering-are broadly applicable to other domains and offer a flexible framework for large-scale social media analysis.

Paper Structure

This paper contains 32 sections, 1 equation, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Examples of corporate tweets annotated using our methodology to identify sustainability relevance (annotations from our annotation approach are displayed below the tweet in a grey box). Our LLM-based approach reveals that explicit mentions of the Sustainable Development Goals (SDGs) are uncommon, with sustainability claims often conveyed through contextual information. This highlights a central component of this work: detecting sustainability-related content in corporate social media posts.
  • Figure 2: Overview of the proposed approach for analyzing corporate social media content. The pipeline includes two main components: (1) textual analysis using large language models to classify tweets by their relevance to the 17 Sustainable Development Goals (SDGs) and explore links to real-world outcomes such as ESG risk and engagement, and (2) visual analysis using a vision-language model to identify visual semantic themes tied to elevated ESG risk and/or engagement. Together, these analyses provide a comprehensive view of SDG-related tweet content across industries.
  • Figure 3: Total and SDG-relevant Tweet Volumes Across Various Sectors. The boxplots on the left depict the distribution of company-level proportions of SDG-relevant to total content, providing a macro-averaged view of SDG-related communication within each industry. The bar plot on the right shows the aggregated proportion of SDG-relevant versus general content for each industry group.
  • Figure 4: Normalized distribution of SDG-related tweet occurrences, categorized by color-coded parent themes: 'People' (SDGs 1--5; shades of blue), 'Planet' (SDGs 6, 12--15; shades of green), 'Prosperity' (SDGs 7--11; shades of yellow and orange), 'Peace' (SDG 16; purple), and 'Partnership' (SDG 17; teal). Each color group reflects a distinct overarching dimension of sustainable development.
  • Figure 5: The lines indicate the evolution of total tweet volumes and SDG-relevant tweet volumes from January 2017 to December 2022. Solid lines depict absolute volumes of total and SDG-relevant tweets, while the dotted line shows the proportion of SDG-relevant tweets relative to total tweets. The quarter indicated by the x-axis labels shows the end of the quarter for which the data belonged.
  • ...and 3 more figures