Table of Contents
Fetching ...

Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis

Md. Arid Hasan, Shudipta Das, Afiyat Anjum, Firoj Alam, Anika Anjum, Avijit Sarker, Sheak Rashed Haider Noori

TL;DR

This study tackles Bangla sentiment analysis in a low-resource setting by introducing MUBASE, a large manually annotated dataset of 33,606 Bangla social-media posts, and by benchmarking zero- and few-shot prompting of LLMs (GPT-4, BLOOMZ, Flan-T5) against fine-tuned models. It systematically compares baselines, classical models, small language models (e.g., BanglaBERT), GPT embeddings, and various LLM prompting strategies, using strict evaluation metrics and a robust data split. The findings show that fine-tuned monolingual models consistently outperform zero-/few-shot LLM prompting, though LLMs can be competitive and are valuable when data or resources for fine-tuning are limited; ensemble techniques offer notable gains. The work provides a publicly available resource and a clear benchmark for future Bangla NLP research, highlighting the continued value of domain-specific fine-tuning and the potential of native-language prompts for in-language sentiment tasks.

Abstract

The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community.

Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis

TL;DR

This study tackles Bangla sentiment analysis in a low-resource setting by introducing MUBASE, a large manually annotated dataset of 33,606 Bangla social-media posts, and by benchmarking zero- and few-shot prompting of LLMs (GPT-4, BLOOMZ, Flan-T5) against fine-tuned models. It systematically compares baselines, classical models, small language models (e.g., BanglaBERT), GPT embeddings, and various LLM prompting strategies, using strict evaluation metrics and a robust data split. The findings show that fine-tuned monolingual models consistently outperform zero-/few-shot LLM prompting, though LLMs can be competitive and are valuable when data or resources for fine-tuning are limited; ensemble techniques offer notable gains. The work provides a publicly available resource and a clear benchmark for future Bangla NLP research, highlighting the continued value of domain-specific fine-tuning and the potential of native-language prompts for in-language sentiment tasks.

Abstract

The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,606 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community.
Paper Structure (34 sections, 2 figures, 4 tables)

This paper contains 34 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Performance comparisons with baselines (random and majority), fine-tuned models and LLMs (GPT and Bloomz).
  • Figure 2: The distribution sentence length (number of words) associated with each sentiment label.