A Multi-Embedding Convergence Network on Siamese Architecture for Fake Reviews
Sankarshan Dasgupta, James Buckley
TL;DR
The paper tackles fake review detection to safeguard consumer decisions in online marketplaces. It proposes a multi-embedding convergence network within a Siamese LSTM framework that fuses Word2Vec and MiniLM BERT representations and refines decisions with a fuzzy logic classifier, evaluated on a dataset of $40432$ reviews with fake/real labels. The model achieves a validation accuracy of $0.84$ and training accuracy of $0.91$, with the fuzzy decision stage boosting final accuracy to $0.88$ and producing $35791$ predictions from $40432$ samples; the similarity between branches is defined as $\text{Similarity} = \frac{1 + \cos(f(x_1), f(x_2))}{2}$ and the cosine distance as $\text{CosineDistance}(A,B) = 1 - \frac{ \hat{A} \cdot \hat{B} }{ \|\hat{A}\| \|\hat{B}\| }$.$ This work demonstrates that combining contextual and semantic embeddings within a Siamese framework, followed by uncertainty-aware fuzzy classification, can robustly detect fake reviews and is scalable to larger corpora and additional embeddings.
Abstract
In this new digital era, accessibility to real-world events is moving towards web-based modules. This is mostly visible on e-commerce websites where there is limited availability of physical verification. With this unforeseen development, we depend on the verification in the virtual world to influence our decisions. One of the decision making process is deeply based on review reading. Reviews play an important part in this transactional process. And seeking a real review can be very tenuous work for the user. On the other hand, fake review heavily impacts these transaction records of a product. The article presents an implementation of a Siamese network for detecting fake reviews. The fake reviews dataset, consisting of 40K reviews, preprocessed with different techniques. The cleaned data is passed through embeddings generated by MiniLM BERT for contextual relationship and Word2Vec for semantic relationship to form vectors. Further, the embeddings are trained in a Siamese network with LSTM layers connected to fuzzy logic for decision-making. The results show that fake reviews can be detected with high accuracy on a siamese network for prediction and verification.
