HP-BERT: A framework for longitudinal study of Hinduphobia on social media via language models
Ashutosh Singh, Rohitash Chandra
TL;DR
The study addresses Hinduphobia in social media during the COVID-19 era by creating the Hinduphobic COVID-19 X (Twitter) Dataset and developing HP-BERT, a multitask BERT-based model capable of Hinduphobic detection and sentiment analysis. Through a multi-stage fine-tuning process—first on Hinduphobic data and then on the SenWave sentiment corpus—HP-BERT achieves high accuracy ($0.9472$) and substantially outperforms baseline transformer models. Applied to the Global COVID-19 X (Twitter) Dataset (27.4 million tweets across six countries), the framework reveals moderate correlations between COVID-19 case surges and Hinduphobic content ($r ightarrow [0.312,0.428]$) and uncovers longitudinal and cross-country patterns in Hinduphobic discourse. The work provides open datasets and code, demonstrates a scalable approach for longitudinal hate-speech monitoring, and highlights pandemic-driven discrimination dynamics with practical implications for monitoring and policy interventions.
Abstract
During the COVID-19 pandemic, community tensions intensified, contributing to discriminatory sentiments against various religious groups, including Hindu communities. Recent advances in language models have shown promise for social media analysis with potential for longitudinal studies of social media platforms, such as X (Twitter). We present a computational framework for analyzing anti-Hindu sentiment (Hinduphobia) during the COVID-19 period, introducing an abuse detection and sentiment analysis approach for longitudinal analysis on X. We curate and release a "Hinduphobic COVID-19 XDataset" containing 8,000 annotated and manually verified tweets. We then develop the Hinduphobic BERT (HP-BERT) model using this dataset and achieve 94.72\% accuracy, outperforming baseline Transformer-based language models. The model incorporates multi-label sentiment analysis capabilities through additional fine-tuning. Our analysis encompasses approximately 27.4 million tweets from six countries, including Australia, Brazil, India, Indonesia, Japan, and the United Kingdom. Statistical analysis reveals moderate correlations (r = 0.312-0.428) between COVID-19 case increases and Hinduphobic content volume, highlighting how pandemic-related stress may contribute to discriminatory discourse. This study provides evidence of social media-based religious discrimination during a COVID-19 crisis.
