A Multi-Embedding Convergence Network on Siamese Architecture for Fake Reviews

Sankarshan Dasgupta; James Buckley

A Multi-Embedding Convergence Network on Siamese Architecture for Fake Reviews

Sankarshan Dasgupta, James Buckley

TL;DR

The paper tackles fake review detection to safeguard consumer decisions in online marketplaces. It proposes a multi-embedding convergence network within a Siamese LSTM framework that fuses Word2Vec and MiniLM BERT representations and refines decisions with a fuzzy logic classifier, evaluated on a dataset of $40432$ reviews with fake/real labels. The model achieves a validation accuracy of $0.84$ and training accuracy of $0.91$, with the fuzzy decision stage boosting final accuracy to $0.88$ and producing $35791$ predictions from $40432$ samples; the similarity between branches is defined as $\text{Similarity} = \frac{1 + \cos(f(x_1), f(x_2))}{2}$ and the cosine distance as $\text{CosineDistance}(A,B) = 1 - \frac{ \hat{A} \cdot \hat{B} }{ \|\hat{A}\| \|\hat{B}\| }$.$ This work demonstrates that combining contextual and semantic embeddings within a Siamese framework, followed by uncertainty-aware fuzzy classification, can robustly detect fake reviews and is scalable to larger corpora and additional embeddings.

Abstract

In this new digital era, accessibility to real-world events is moving towards web-based modules. This is mostly visible on e-commerce websites where there is limited availability of physical verification. With this unforeseen development, we depend on the verification in the virtual world to influence our decisions. One of the decision making process is deeply based on review reading. Reviews play an important part in this transactional process. And seeking a real review can be very tenuous work for the user. On the other hand, fake review heavily impacts these transaction records of a product. The article presents an implementation of a Siamese network for detecting fake reviews. The fake reviews dataset, consisting of 40K reviews, preprocessed with different techniques. The cleaned data is passed through embeddings generated by MiniLM BERT for contextual relationship and Word2Vec for semantic relationship to form vectors. Further, the embeddings are trained in a Siamese network with LSTM layers connected to fuzzy logic for decision-making. The results show that fake reviews can be detected with high accuracy on a siamese network for prediction and verification.

A Multi-Embedding Convergence Network on Siamese Architecture for Fake Reviews

TL;DR

reviews with fake/real labels. The model achieves a validation accuracy of

and training accuracy of

, with the fuzzy decision stage boosting final accuracy to

and producing

predictions from

samples; the similarity between branches is defined as

and the cosine distance as

.$ This work demonstrates that combining contextual and semantic embeddings within a Siamese framework, followed by uncertainty-aware fuzzy classification, can robustly detect fake reviews and is scalable to larger corpora and additional embeddings.

Abstract

Paper Structure (10 sections, 3 equations, 4 figures, 1 table)

This paper contains 10 sections, 3 equations, 4 figures, 1 table.

Introduction
Related Works
Methodology
Understanding the dataset
Preprocess the corpora
Word embedding generation
Siamese training with LSTM
Fuzzy classifier decision output
Results and Discussion
Future Works

Figures (4)

Figure 1: Our network architecture on fake review dataset
Figure 2: Word cloud for most common words found in Yelp Dataset
Figure 3: Word Embeddings: Left: Word2Vec architecture, Right: MiniLM BERT Transformer
Figure 4: A Fuzzy Output for the Fake review detection on Siamese network

A Multi-Embedding Convergence Network on Siamese Architecture for Fake Reviews

TL;DR

Abstract

A Multi-Embedding Convergence Network on Siamese Architecture for Fake Reviews

Authors

TL;DR

Abstract

Table of Contents

Figures (4)