Table of Contents
Fetching ...

Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers

Jerry Chongyi Hu, Mohammed Shahid Modi, Boleslaw K. Szymanski

TL;DR

This paper investigates public sentiment dynamics on Chinese social media (Weibo) during the early COVID-19 period, focusing on sarcasm and moderation. It introduces a large-scale sentiment classification pipeline using the instruction-tuned Llama 3 8B with few-shot prompts to categorize millions of posts into positive, negative, neutral, and sarcastic categories, and validates performance against ground-truth baselines. The study compares COVID-19 discourse with the African Swine Fever event, revealing that government messaging amplified positive sentiment during COVID-19 while sarcasm and negative signals tracked surge patterns, informing understanding of online polarization. Despite limitations in sarcasm detection accuracy, the approach demonstrates a scalable, non-English sentiment-analysis workflow with implications for crisis communication and governance.

Abstract

Studying public sentiment during crises is crucial for understanding how opinions and sentiments shift, resulting in polarized societies. We study Weibo, the most popular microblogging site in China, using posts made during the outbreak of the COVID-19 crisis. The study period includes the pre-COVID-19 stage, the outbreak stage, and the early stage of epidemic prevention. We use Llama 3 8B, a Large Language Model, to analyze users' sentiments on the platform by classifying them into positive, negative, sarcastic, and neutral categories. Analyzing sentiment shifts on Weibo provides insights into how social events and government actions influence public opinion. This study contributes to understanding the dynamics of social sentiments during health crises, fulfilling a gap in sentiment analysis for Chinese platforms. By examining these dynamics, we aim to offer valuable perspectives on digital communication's role in shaping society's responses during unprecedented global challenges.

Using LLMs to Infer Non-Binary COVID-19 Sentiments of Chinese Micro-bloggers

TL;DR

This paper investigates public sentiment dynamics on Chinese social media (Weibo) during the early COVID-19 period, focusing on sarcasm and moderation. It introduces a large-scale sentiment classification pipeline using the instruction-tuned Llama 3 8B with few-shot prompts to categorize millions of posts into positive, negative, neutral, and sarcastic categories, and validates performance against ground-truth baselines. The study compares COVID-19 discourse with the African Swine Fever event, revealing that government messaging amplified positive sentiment during COVID-19 while sarcasm and negative signals tracked surge patterns, informing understanding of online polarization. Despite limitations in sarcasm detection accuracy, the approach demonstrates a scalable, non-English sentiment-analysis workflow with implications for crisis communication and governance.

Abstract

Studying public sentiment during crises is crucial for understanding how opinions and sentiments shift, resulting in polarized societies. We study Weibo, the most popular microblogging site in China, using posts made during the outbreak of the COVID-19 crisis. The study period includes the pre-COVID-19 stage, the outbreak stage, and the early stage of epidemic prevention. We use Llama 3 8B, a Large Language Model, to analyze users' sentiments on the platform by classifying them into positive, negative, sarcastic, and neutral categories. Analyzing sentiment shifts on Weibo provides insights into how social events and government actions influence public opinion. This study contributes to understanding the dynamics of social sentiments during health crises, fulfilling a gap in sentiment analysis for Chinese platforms. By examining these dynamics, we aim to offer valuable perspectives on digital communication's role in shaping society's responses during unprecedented global challenges.
Paper Structure (11 sections, 4 figures, 4 tables)

This paper contains 11 sections, 4 figures, 4 tables.

Figures (4)

  • Figure S1: Distribution of duplicate, original posts and reposts. 21.7% of the posts are duplicated (877,031 posts), 55% are distinct posts (2,226,667 posts), and 23.4% of the posts are reposts (945,709 posts).
  • Figure S2: A sentiment timeline stacked area chart illustrating weekly counts of positive, neutral, negative, and sarcastic sentiments from January 2019 to March 2020. Posts in all sentiment categories are significantly stacked around late 2019, peaking in early 2020, with positive and neutral sentiments dominating the counts.
  • Figure S3: Pie charts show the distribution of four sentiment categories (positive, neutral, negative, and sarcastic) among posts. Chart (a) represents the sentiment scale across all posts, while chart (b) focuses on distinct (unique) posts with no duplicates.
  • Figure S4: Timeline stacked area chart showing the percentage distribution of four sentiments (positive, neutral, negative, sarcastic) in posts over time. The y-axis represents the current fraction of each sentiment at each time. Each segment's thickness corresponds to the proportion of that sentiment in the dataset. For example, a wider section indicates a higher percentage of that sentiment during the corresponding period on the x-axis. The x-axis represents days from January 2019 to March 2020, with key events or transitions, such as the note at '2019-11,' which marks where over 99% of posts occurred after this point. This chart allows an easy visual comparison of the evolution of each sentiment over time.