Table of Contents
Fetching ...

CineXDrama: Relevance Detection and Sentiment Analysis of Bangla YouTube Comments on Movie-Drama using Transformers: Insights from Interpretability Tool

Usafa Akther Rifa, Pronay Debnath, Busra Kamal Rafa, Shamaun Safa Hridi, Md. Aminur Rahman

TL;DR

This work addresses the challenge of extracting meaningful sentiment from Bangla YouTube comments about movie and drama content by first filtering for relevance and then performing sentiment classification. It introduces the CineXDrama dataset with 14,000 labeled comments and benchmarks eight transformer models, finding BanglaBERT to be the strongest performer with 83.99% relevance accuracy and 93.30% sentiment accuracy. LIME-based interpretability is integrated to reveal which features drive predictions, enhancing transparency. The approach helps filmmakers gauge audience reactions by focusing on content-relevant feedback and provides a Bangla-language resource for future research in sentiment and relevance analysis. The results demonstrate the feasibility and value of language-specific transformers for Bangla social-media analytics and offer a path for expanding to additional media forms and emotion categories.

Abstract

In recent years, YouTube has become the leading platform for Bangla movies and dramas, where viewers express their opinions in comments that convey their sentiments about the content. However, not all comments are relevant for sentiment analysis, necessitating a filtering mechanism. We propose a system that first assesses the relevance of comments and then analyzes the sentiment of those deemed relevant. We introduce a dataset of 14,000 manually collected and preprocessed comments, annotated for relevance (relevant or irrelevant) and sentiment (positive or negative). Eight transformer models, including BanglaBERT, were used for classification tasks, with BanglaBERT achieving the highest accuracy (83.99% for relevance detection and 93.3% for sentiment analysis). The study also integrates LIME to interpret model decisions, enhancing transparency.

CineXDrama: Relevance Detection and Sentiment Analysis of Bangla YouTube Comments on Movie-Drama using Transformers: Insights from Interpretability Tool

TL;DR

This work addresses the challenge of extracting meaningful sentiment from Bangla YouTube comments about movie and drama content by first filtering for relevance and then performing sentiment classification. It introduces the CineXDrama dataset with 14,000 labeled comments and benchmarks eight transformer models, finding BanglaBERT to be the strongest performer with 83.99% relevance accuracy and 93.30% sentiment accuracy. LIME-based interpretability is integrated to reveal which features drive predictions, enhancing transparency. The approach helps filmmakers gauge audience reactions by focusing on content-relevant feedback and provides a Bangla-language resource for future research in sentiment and relevance analysis. The results demonstrate the feasibility and value of language-specific transformers for Bangla social-media analytics and offer a path for expanding to additional media forms and emotion categories.

Abstract

In recent years, YouTube has become the leading platform for Bangla movies and dramas, where viewers express their opinions in comments that convey their sentiments about the content. However, not all comments are relevant for sentiment analysis, necessitating a filtering mechanism. We propose a system that first assesses the relevance of comments and then analyzes the sentiment of those deemed relevant. We introduce a dataset of 14,000 manually collected and preprocessed comments, annotated for relevance (relevant or irrelevant) and sentiment (positive or negative). Eight transformer models, including BanglaBERT, were used for classification tasks, with BanglaBERT achieving the highest accuracy (83.99% for relevance detection and 93.3% for sentiment analysis). The study also integrates LIME to interpret model decisions, enhancing transparency.

Paper Structure

This paper contains 21 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Data annotation procedure along with pre-defined guidelines
  • Figure 2: Suggested architecture of PLMs for relevance and sentiment analysis
  • Figure 3: Confusion Matrix of BanglaBERT for Relevance Detection
  • Figure 4: Confusion Matrix of BanglaBERT for Sentiment Analysis
  • Figure 5: Relevant Comment Correctly Classified by BanglaBERT
  • ...and 3 more figures