A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News
Mirza Raquib, Munazer Montasir Akash, Tawhid Ahmed, Saydul Akbar Murad, Farida Siddiqi Prity, Mohammad Amzad Hossain, Asif Pervez Polok, Nick Rahimi
TL;DR
This work tackles the joint problem of Bangla headline classification and sentiment analysis using a unified BERT-CNN-BiLSTM architecture that fuses transformer-based contextual embeddings with CNN-derived local features and BiLSTM-based global dependencies. It introduces two class-balancing strategies (before and after data splitting) and evaluates on the BAN-ABSA dataset with 5-fold cross-validation, using metrics like accuracy, precision, recall, and F1, plus LIME explanations for interpretability. Results show state-of-the-art performance for both tasks, with technique-specific insights: oversampling before split generally boosts performance in Technique-1, while training on imbalanced data (Technique-2) yields strong headline results and solid sentiment performance, achieving robust generalization on external datasets such as Potrika. The approach advances Bangla NLP by providing a strong, multi-task baseline for low-resource languages and demonstrates practical utility for multi-faceted news analysis and interpretability.
Abstract
In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portals can be challenging. Newspaper headlines with sentiment analysis tell us what the news is about (e.g., politics, sports) and how the news makes us feel (positive, negative, neutral). This helps us quickly understand the emotional tone of the news. This research presents a state-of-the-art approach to Bangla news headline classification combined with sentiment analysis applying Natural Language Processing (NLP) techniques, particularly the hybrid transfer learning model BERT-CNN-BiLSTM. We have explored a dataset called BAN-ABSA of 9014 news headlines, which is the first time that has been experimented with simultaneously in the headline and sentiment categorization in Bengali newspapers. Over this imbalanced dataset, we applied two experimental strategies: technique-1, where undersampling and oversampling are applied before splitting, and technique-2, where undersampling and oversampling are applied after splitting on the In technique-1 oversampling provided the strongest performance, both headline and sentiment, that is 78.57\% and 73.43\% respectively, while technique-2 delivered the highest result when trained directly on the original imbalanced dataset, both headline and sentiment, that is 81.37\% and 64.46\% respectively. The proposed model BERT-CNN-BiLSTM significantly outperforms all baseline models in classification tasks, and achieves new state-of-the-art results for Bangla news headline classification and sentiment analysis. These results demonstrate the importance of leveraging both the headline and sentiment datasets, and provide a strong baseline for Bangla text classification in low-resource.
