Table of Contents
Fetching ...

Enhancing Sentiment Analysis in Bengali Texts: A Hybrid Approach Using Lexicon-Based Algorithm and Pretrained Language Model Bangla-BERT

Hemal Mahmud, Hasan Mahmud, Mohammad Rifat Ahmmad Rashid

TL;DR

This work tackles the challenge of fine-grained Bengali sentiment analysis by creating a new 15,194-review dataset and a Lexicon Data Dictionary to drive the Bangla Sentiment Polarity Score (BSPS), a nine-class rule-based sentiment classifier. It then evaluates BSPS both directly and as a precursor to BanglaBERT, revealing that a hybrid pipeline (BSPS classification followed by BanglaBERT evaluation) outperforms BanglaBERT alone in multi-class sentiment tasks. The key contributions are the BSPS algorithm, the LDD, the large-scale Bengali dataset, and the demonstration that hybrid rule-based plus pretrained-model approaches yield stronger performance than single-model approaches. This has practical implications for sentiment analysis in Bengali and other morphologically rich, low-resource languages, highlighting a viable path to richer sentiment understanding in real-world applications.

Abstract

Sentiment analysis (SA) is a process of identifying the emotional tone or polarity within a given text and aims to uncover the user's complex emotions and inner feelings. While sentiment analysis has been extensively studied for languages like English, research in Bengali, remains limited, particularly for fine-grained sentiment categorization. This work aims to connect this gap by developing a novel approach that integrates rule-based algorithms with pre-trained language models. We developed a dataset from scratch, comprising over 15,000 manually labeled reviews. Next, we constructed a Lexicon Data Dictionary, assigning polarity scores to the reviews. We developed a novel rule based algorithm Bangla Sentiment Polarity Score (BSPS), an approach capable of generating sentiment scores and classifying reviews into nine distinct sentiment categories. To assess the performance of this method, we evaluated the classified sentiments using BanglaBERT, a pre-trained transformer-based language model. We also performed sentiment classification directly with BanglaBERT on the original data and evaluated this model's results. Our analysis revealed that the BSPS + BanglaBERT hybrid approach outperformed the standalone BanglaBERT model, achieving higher accuracy, precision, and nuanced classification across the nine sentiment categories. The results of our study emphasize the value and effectiveness of combining rule-based and pre-trained language model approaches for enhanced sentiment analysis in Bengali and suggest pathways for future research and application in languages with similar linguistic complexities.

Enhancing Sentiment Analysis in Bengali Texts: A Hybrid Approach Using Lexicon-Based Algorithm and Pretrained Language Model Bangla-BERT

TL;DR

This work tackles the challenge of fine-grained Bengali sentiment analysis by creating a new 15,194-review dataset and a Lexicon Data Dictionary to drive the Bangla Sentiment Polarity Score (BSPS), a nine-class rule-based sentiment classifier. It then evaluates BSPS both directly and as a precursor to BanglaBERT, revealing that a hybrid pipeline (BSPS classification followed by BanglaBERT evaluation) outperforms BanglaBERT alone in multi-class sentiment tasks. The key contributions are the BSPS algorithm, the LDD, the large-scale Bengali dataset, and the demonstration that hybrid rule-based plus pretrained-model approaches yield stronger performance than single-model approaches. This has practical implications for sentiment analysis in Bengali and other morphologically rich, low-resource languages, highlighting a viable path to richer sentiment understanding in real-world applications.

Abstract

Sentiment analysis (SA) is a process of identifying the emotional tone or polarity within a given text and aims to uncover the user's complex emotions and inner feelings. While sentiment analysis has been extensively studied for languages like English, research in Bengali, remains limited, particularly for fine-grained sentiment categorization. This work aims to connect this gap by developing a novel approach that integrates rule-based algorithms with pre-trained language models. We developed a dataset from scratch, comprising over 15,000 manually labeled reviews. Next, we constructed a Lexicon Data Dictionary, assigning polarity scores to the reviews. We developed a novel rule based algorithm Bangla Sentiment Polarity Score (BSPS), an approach capable of generating sentiment scores and classifying reviews into nine distinct sentiment categories. To assess the performance of this method, we evaluated the classified sentiments using BanglaBERT, a pre-trained transformer-based language model. We also performed sentiment classification directly with BanglaBERT on the original data and evaluated this model's results. Our analysis revealed that the BSPS + BanglaBERT hybrid approach outperformed the standalone BanglaBERT model, achieving higher accuracy, precision, and nuanced classification across the nine sentiment categories. The results of our study emphasize the value and effectiveness of combining rule-based and pre-trained language model approaches for enhanced sentiment analysis in Bengali and suggest pathways for future research and application in languages with similar linguistic complexities.

Paper Structure

This paper contains 6 sections, 1 figure.

Figures (1)

  • Figure 1: Visualization of methodology