Table of Contents
Fetching ...

Bengali Fake Reviews: A Benchmark Dataset and Detection System

G. M. Shahariar, Md. Tanvir Rouf Shawon, Faisal Muhammad Shah, Mohammad Shafiul Alam, Md. Shahriar Mahbub

TL;DR

This work introduces the Bengali Fake Review Detection (BFRD) dataset, a first public resource for identifying Bengali fake reviews, focusing on Bengali food-related content and addressing code-mixed input via a transliteration/translation pipeline. It systematically evaluates a range of deep learning and pre-trained transformer models, culminating in a weighted ensemble of BanglaBERT variants that achieves a high weighted F1-score of $0.9843$ on 13390 reviews (1339 fake, 6695 augmented fake, and 6695 non-fake). The authors also leverage data augmentation (nlpaug, bnaug) to manage class imbalance and apply LIME for model interpretability, along with a misclassification analysis to categorize error types. The dataset and methods hold practical significance for real-time Bengali fake-review detection, with potential deployment across Bengali-language review platforms and guidance for future cross-domain expansions.

Abstract

The proliferation of fake reviews on various online platforms has created a major concern for both consumers and businesses. Such reviews can deceive customers and cause damage to the reputation of products or services, making it crucial to identify them. Although the detection of fake reviews has been extensively studied in English language, detecting fake reviews in non-English languages such as Bengali is still a relatively unexplored research area. This paper introduces the Bengali Fake Review Detection (BFRD) dataset, the first publicly available dataset for identifying fake reviews in Bengali. The dataset consists of 7710 non-fake and 1339 fake food-related reviews collected from social media posts. To convert non-Bengali words in a review, a unique pipeline has been proposed that translates English words to their corresponding Bengali meaning and also back transliterates Romanized Bengali to Bengali. We have conducted rigorous experimentation using multiple deep learning and pre-trained transformer language models to develop a reliable detection system. Finally, we propose a weighted ensemble model that combines four pre-trained transformers: BanglaBERT, BanglaBERT Base, BanglaBERT Large, and BanglaBERT Generator . According to the experiment results, the proposed ensemble model obtained a weighted F1-score of 0.9843 on 13390 reviews, including 1339 actual fake reviews and 5356 augmented fake reviews generated with the nlpaug library. The remaining 6695 reviews were randomly selected from the 7710 non-fake instances. The model achieved a 0.9558 weighted F1-score when the fake reviews were augmented using the bnaug library.

Bengali Fake Reviews: A Benchmark Dataset and Detection System

TL;DR

This work introduces the Bengali Fake Review Detection (BFRD) dataset, a first public resource for identifying Bengali fake reviews, focusing on Bengali food-related content and addressing code-mixed input via a transliteration/translation pipeline. It systematically evaluates a range of deep learning and pre-trained transformer models, culminating in a weighted ensemble of BanglaBERT variants that achieves a high weighted F1-score of on 13390 reviews (1339 fake, 6695 augmented fake, and 6695 non-fake). The authors also leverage data augmentation (nlpaug, bnaug) to manage class imbalance and apply LIME for model interpretability, along with a misclassification analysis to categorize error types. The dataset and methods hold practical significance for real-time Bengali fake-review detection, with potential deployment across Bengali-language review platforms and guidance for future cross-domain expansions.

Abstract

The proliferation of fake reviews on various online platforms has created a major concern for both consumers and businesses. Such reviews can deceive customers and cause damage to the reputation of products or services, making it crucial to identify them. Although the detection of fake reviews has been extensively studied in English language, detecting fake reviews in non-English languages such as Bengali is still a relatively unexplored research area. This paper introduces the Bengali Fake Review Detection (BFRD) dataset, the first publicly available dataset for identifying fake reviews in Bengali. The dataset consists of 7710 non-fake and 1339 fake food-related reviews collected from social media posts. To convert non-Bengali words in a review, a unique pipeline has been proposed that translates English words to their corresponding Bengali meaning and also back transliterates Romanized Bengali to Bengali. We have conducted rigorous experimentation using multiple deep learning and pre-trained transformer language models to develop a reliable detection system. Finally, we propose a weighted ensemble model that combines four pre-trained transformers: BanglaBERT, BanglaBERT Base, BanglaBERT Large, and BanglaBERT Generator . According to the experiment results, the proposed ensemble model obtained a weighted F1-score of 0.9843 on 13390 reviews, including 1339 actual fake reviews and 5356 augmented fake reviews generated with the nlpaug library. The remaining 6695 reviews were randomly selected from the 7710 non-fake instances. The model achieved a 0.9558 weighted F1-score when the fake reviews were augmented using the bnaug library.
Paper Structure (45 sections, 7 equations, 9 figures, 29 tables, 2 algorithms)

This paper contains 45 sections, 7 equations, 9 figures, 29 tables, 2 algorithms.

Figures (9)

  • Figure 1: Dataset development pipeline
  • Figure 2: Data annotation procedure along with pre-defined guidelines
  • Figure 3: Class wise ratio of number of reviews with respect to the review length
  • Figure 4: Schematic diagram of Bengali fake review detection system
  • Figure 5: Augmentation pipeline using nlpaug and bnaug
  • ...and 4 more figures