Bengali Fake Reviews: A Benchmark Dataset and Detection System

G. M. Shahariar; Md. Tanvir Rouf Shawon; Faisal Muhammad Shah; Mohammad Shafiul Alam; Md. Shahriar Mahbub

Bengali Fake Reviews: A Benchmark Dataset and Detection System

G. M. Shahariar, Md. Tanvir Rouf Shawon, Faisal Muhammad Shah, Mohammad Shafiul Alam, Md. Shahriar Mahbub

TL;DR

This work introduces the Bengali Fake Review Detection (BFRD) dataset, a first public resource for identifying Bengali fake reviews, focusing on Bengali food-related content and addressing code-mixed input via a transliteration/translation pipeline. It systematically evaluates a range of deep learning and pre-trained transformer models, culminating in a weighted ensemble of BanglaBERT variants that achieves a high weighted F1-score of $0.9843$ on 13390 reviews (1339 fake, 6695 augmented fake, and 6695 non-fake). The authors also leverage data augmentation (nlpaug, bnaug) to manage class imbalance and apply LIME for model interpretability, along with a misclassification analysis to categorize error types. The dataset and methods hold practical significance for real-time Bengali fake-review detection, with potential deployment across Bengali-language review platforms and guidance for future cross-domain expansions.

Abstract

The proliferation of fake reviews on various online platforms has created a major concern for both consumers and businesses. Such reviews can deceive customers and cause damage to the reputation of products or services, making it crucial to identify them. Although the detection of fake reviews has been extensively studied in English language, detecting fake reviews in non-English languages such as Bengali is still a relatively unexplored research area. This paper introduces the Bengali Fake Review Detection (BFRD) dataset, the first publicly available dataset for identifying fake reviews in Bengali. The dataset consists of 7710 non-fake and 1339 fake food-related reviews collected from social media posts. To convert non-Bengali words in a review, a unique pipeline has been proposed that translates English words to their corresponding Bengali meaning and also back transliterates Romanized Bengali to Bengali. We have conducted rigorous experimentation using multiple deep learning and pre-trained transformer language models to develop a reliable detection system. Finally, we propose a weighted ensemble model that combines four pre-trained transformers: BanglaBERT, BanglaBERT Base, BanglaBERT Large, and BanglaBERT Generator . According to the experiment results, the proposed ensemble model obtained a weighted F1-score of 0.9843 on 13390 reviews, including 1339 actual fake reviews and 5356 augmented fake reviews generated with the nlpaug library. The remaining 6695 reviews were randomly selected from the 7710 non-fake instances. The model achieved a 0.9558 weighted F1-score when the fake reviews were augmented using the bnaug library.

Bengali Fake Reviews: A Benchmark Dataset and Detection System

TL;DR

on 13390 reviews (1339 fake, 6695 augmented fake, and 6695 non-fake). The authors also leverage data augmentation (nlpaug, bnaug) to manage class imbalance and apply LIME for model interpretability, along with a misclassification analysis to categorize error types. The dataset and methods hold practical significance for real-time Bengali fake-review detection, with potential deployment across Bengali-language review platforms and guidance for future cross-domain expansions.

Abstract

Paper Structure (45 sections, 7 equations, 9 figures, 29 tables, 2 algorithms)

This paper contains 45 sections, 7 equations, 9 figures, 29 tables, 2 algorithms.

Introduction
Related Works
Traditional Approaches
Deep Learning Approaches
Problem Definition
Corpus Creation
Data Collection
Data Pre-processing
Data Annotation
Dataset Statistics
Methodology
Text Augmentation
Pre-processing
Detection Methods
CNN and LSTM based models
...and 30 more sections

Figures (9)

Figure 1: Dataset development pipeline
Figure 2: Data annotation procedure along with pre-defined guidelines
Figure 3: Class wise ratio of number of reviews with respect to the review length
Figure 4: Schematic diagram of Bengali fake review detection system
Figure 5: Augmentation pipeline using nlpaug and bnaug
...and 4 more figures

Bengali Fake Reviews: A Benchmark Dataset and Detection System

TL;DR

Abstract

Bengali Fake Reviews: A Benchmark Dataset and Detection System

Authors

TL;DR

Abstract

Table of Contents

Figures (9)