HP-BERT: A framework for longitudinal study of Hinduphobia on social media via language models

Ashutosh Singh; Rohitash Chandra

HP-BERT: A framework for longitudinal study of Hinduphobia on social media via language models

Ashutosh Singh, Rohitash Chandra

TL;DR

The study addresses Hinduphobia in social media during the COVID-19 era by creating the Hinduphobic COVID-19 X (Twitter) Dataset and developing HP-BERT, a multitask BERT-based model capable of Hinduphobic detection and sentiment analysis. Through a multi-stage fine-tuning process—first on Hinduphobic data and then on the SenWave sentiment corpus—HP-BERT achieves high accuracy ($0.9472$) and substantially outperforms baseline transformer models. Applied to the Global COVID-19 X (Twitter) Dataset (27.4 million tweets across six countries), the framework reveals moderate correlations between COVID-19 case surges and Hinduphobic content ($r ightarrow [0.312,0.428]$) and uncovers longitudinal and cross-country patterns in Hinduphobic discourse. The work provides open datasets and code, demonstrates a scalable approach for longitudinal hate-speech monitoring, and highlights pandemic-driven discrimination dynamics with practical implications for monitoring and policy interventions.

Abstract

During the COVID-19 pandemic, community tensions intensified, contributing to discriminatory sentiments against various religious groups, including Hindu communities. Recent advances in language models have shown promise for social media analysis with potential for longitudinal studies of social media platforms, such as X (Twitter). We present a computational framework for analyzing anti-Hindu sentiment (Hinduphobia) during the COVID-19 period, introducing an abuse detection and sentiment analysis approach for longitudinal analysis on X. We curate and release a "Hinduphobic COVID-19 XDataset" containing 8,000 annotated and manually verified tweets. We then develop the Hinduphobic BERT (HP-BERT) model using this dataset and achieve 94.72\% accuracy, outperforming baseline Transformer-based language models. The model incorporates multi-label sentiment analysis capabilities through additional fine-tuning. Our analysis encompasses approximately 27.4 million tweets from six countries, including Australia, Brazil, India, Indonesia, Japan, and the United Kingdom. Statistical analysis reveals moderate correlations (r = 0.312-0.428) between COVID-19 case increases and Hinduphobic content volume, highlighting how pandemic-related stress may contribute to discriminatory discourse. This study provides evidence of social media-based religious discrimination during a COVID-19 crisis.

HP-BERT: A framework for longitudinal study of Hinduphobia on social media via language models

TL;DR

) and substantially outperforms baseline transformer models. Applied to the Global COVID-19 X (Twitter) Dataset (27.4 million tweets across six countries), the framework reveals moderate correlations between COVID-19 case surges and Hinduphobic content (

) and uncovers longitudinal and cross-country patterns in Hinduphobic discourse. The work provides open datasets and code, demonstrates a scalable approach for longitudinal hate-speech monitoring, and highlights pandemic-driven discrimination dynamics with practical implications for monitoring and policy interventions.

Abstract

Paper Structure (25 sections, 18 figures, 12 tables)

This paper contains 25 sections, 18 figures, 12 tables.

Introduction
Related Work
Sentiment and Semantic Analysis
Abuse and Misinformation
Methods
Datasets
Hinduphobic COVID-19 X (Twitter) Dataset
SenWave dataset
Global COVID-19 X (Twitter) Dataset
Hinduphobic-BERT Model
Framework
Results
Technical setup
Fine-tuning the HP-BERT Model
Comparative Model Analysis
...and 10 more sections

Figures (18)

Figure 1: Data collection and processing workflow from initial data extraction via Twitter API to manually labelling data points. This is followed by automatic labelling of the remaining tweets using GPT-3.5 Turbo using a human-in-the-loop strategy xin2018accelerating. We removed incorrectly labeled tweets identified during verification to ensure final dataset quality.
Figure 2: HP-BERT model architecture for detecting Hinduphobic sentiment, featuring tokenisation (encoding) of Hinglish/English X using a set of encoders. It includes a fully connected layer for classifying X as Hinduphobic or Non-Hinduphobic/Neutral, and the final layer provides sentiment classification into different categories.
Figure 3: A multi-stage process for HP-BERT model training and analysis: Stage 1 involves dataset collection, Stage 2 preprocesses data, Stage 3 fine-tunes HP-BERT for binary and sentiment classification, Stage 4 applies classification and trend analysis, and Stage 5 conducts sentiment analysis with visualisation of trends and polarity.
Figure 4: Monthly trends in COVID-19 case counts.
Figure 5: Monthly trends of Hinduphobic tweets across all countries.
...and 13 more figures

HP-BERT: A framework for longitudinal study of Hinduphobia on social media via language models

TL;DR

Abstract

HP-BERT: A framework for longitudinal study of Hinduphobia on social media via language models

Authors

TL;DR

Abstract

Table of Contents

Figures (18)