Milestones in Bengali Sentiment Analysis leveraging Transformer-models: Fundamentals, Challenges and Future Directions
Saptarshi Sengupta, Shreya Ghosh, Prasenjit Mitra, Tarikul Islam Tamiti
TL;DR
This paper surveys the state-of-the-art in Bengali sentiment analysis with Transformer-based models, addressing the scarcity of resources for Bengali. It systematically categorizes encoder-, decoder-, and seq2seq-based foundation models pretrained on Bengali corpora and assesses their applicability to sentiment tasks. It synthesizes dataset benchmarks and reports that BanglaBERT-based models achieve strong performance across domains, with cross-lingual and transliteration-enabled approaches broadening coverage. It concludes with open challenges and concrete future directions, including larger diverse corpora, multilingual and multimodal avenues, and ethically aware modeling, underscoring the practical impact for Bengali NLP deployment.
Abstract
Sentiment Analysis (SA) refers to the task of associating a view polarity (usually, positive, negative, or neutral; or even fine-grained such as slightly angry, sad, etc.) to a given text, essentially breaking it down to a supervised (since we have the view labels apriori) classification task. Although heavily studied in resource-rich languages such as English thus pushing the SOTA by leaps and bounds, owing to the arrival of the Transformer architecture, the same cannot be said for resource-poor languages such as Bengali (BN). For a language spoken by roughly 300 million people, the technology enabling them to run trials on their favored tongue is severely lacking. In this paper, we analyze the SOTA for SA in Bengali, particularly, Transformer-based models. We discuss available datasets, their drawbacks, the nuances associated with Bengali i.e. what makes this a challenging language to apply SA on, and finally provide insights for future direction to mitigate the limitations in the field.
