Table of Contents
Fetching ...

Word frequency and sentiment analysis of twitter messages during Coronavirus pandemic

Nikhil Kumar Rajput, Bhavya Ahuja Grover, Vipin Kumar Rathi, Riya Bansal

TL;DR

This study analyzes COVID-19–related Twitter discourse through two quantitative lenses: word-frequency analysis modeled by a power-law distribution and sentiment analysis via TextBlob. It demonstrates that unigrams, bigrams, and trigrams follow power-law patterns with high goodness-of-fit, while revealing a predominantly neutral public sentiment (approximately $90.97\%$) and smaller shares of positive ($6.45\%$) and negative ($2.57\%$) tweets. Using a COVID-19–focused Twitter dataset sourced from Kaggle and processed with NLP tools (NLTK, WordNetLemmatizer), the work outlines a full preprocessing and visualization pipeline. The findings shed light on how public discourse around the pandemic evolved on Twitter and provide a methodological baseline for rapid sentiment and frequency analyses in social media data.

Abstract

The COVID-19 epidemic has had a great impact on social media conversation, especially on sites like Twitter, which has emerged as a hub for public reaction and information sharing. This paper deals by analyzing a vast dataset of Twitter messages related to this disease, starting from January 2020. Two approaches were used: a statistical analysis of word frequencies and a sentiment analysis to gauge user attitudes. Word frequencies are modeled using unigrams, bigrams, and trigrams, with power law distribution as the fitting model. The validity of the model is confirmed through metrics like Sum of Squared Errors (SSE), R-squared ($R^2$), and Root Mean Squared Error (RMSE). High $R^2$ and low SSE/RMSE values indicate a good fit for the model. Sentiment analysis is conducted to understand the general emotional tone of Twitter users messages. The results reveal that a majority of tweets exhibit neutral sentiment polarity, with only 2.57\% expressing negative polarity.

Word frequency and sentiment analysis of twitter messages during Coronavirus pandemic

TL;DR

This study analyzes COVID-19–related Twitter discourse through two quantitative lenses: word-frequency analysis modeled by a power-law distribution and sentiment analysis via TextBlob. It demonstrates that unigrams, bigrams, and trigrams follow power-law patterns with high goodness-of-fit, while revealing a predominantly neutral public sentiment (approximately ) and smaller shares of positive () and negative () tweets. Using a COVID-19–focused Twitter dataset sourced from Kaggle and processed with NLP tools (NLTK, WordNetLemmatizer), the work outlines a full preprocessing and visualization pipeline. The findings shed light on how public discourse around the pandemic evolved on Twitter and provide a methodological baseline for rapid sentiment and frequency analyses in social media data.

Abstract

The COVID-19 epidemic has had a great impact on social media conversation, especially on sites like Twitter, which has emerged as a hub for public reaction and information sharing. This paper deals by analyzing a vast dataset of Twitter messages related to this disease, starting from January 2020. Two approaches were used: a statistical analysis of word frequencies and a sentiment analysis to gauge user attitudes. Word frequencies are modeled using unigrams, bigrams, and trigrams, with power law distribution as the fitting model. The validity of the model is confirmed through metrics like Sum of Squared Errors (SSE), R-squared (), and Root Mean Squared Error (RMSE). High and low SSE/RMSE values indicate a good fit for the model. Sentiment analysis is conducted to understand the general emotional tone of Twitter users messages. The results reveal that a majority of tweets exhibit neutral sentiment polarity, with only 2.57\% expressing negative polarity.

Paper Structure

This paper contains 15 sections, 1 equation, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Evolution of number of Twitter ids involved in covid-19 posts
  • Figure 2: Unigram word cloud
  • Figure 3: Plot for Unigram Frequencies vs Rank
  • Figure 4: Plot for Bigram Frequencies vs Rank
  • Figure 5: Plot for Trigram Frequencies vs Rank
  • ...and 3 more figures