Enhanced Review Detection and Recognition: A Platform-Agnostic Approach with Application to Online Commerce
Priyabrata Karmakar, John Hawkins
TL;DR
The paper addresses the automation of detecting and extracting online reviews across diverse platforms to combat credibility issues and enable large-scale analysis. It introduces PIORDR, a platform-agnostic pipeline that combines object detection (YOLOv8) to locate review areas with OCR to transcribe text, avoiding brittle HTML scraping. It expands the pipeline with three analytics modules—sentiment inconsistency analysis, multilingual extraction and translation, and fake-review detection via a large language model—demonstrating strong performance on known platforms and reasonable generalization to unseen sites. While results on unfamiliar platforms show some degradation, the approach offers a scalable solution for cross-platform review analysis and veracity filtering, with future work focused on improving generalization through distillation, few-shot, and zero-shot learning.
Abstract
Online commerce relies heavily on user generated reviews to provide unbiased information about products that they have not physically seen. The importance of reviews has attracted multiple exploitative online behaviours and requires methods for monitoring and detecting reviews. We present a machine learning methodology for review detection and extraction, and demonstrate that it generalises for use across websites that were not contained in the training data. This method promises to drive applications for automatic detection and evaluation of reviews, regardless of their source. Furthermore, we showcase the versatility of our method by implementing and discussing three key applications for analysing reviews: Sentiment Inconsistency Analysis, which detects and filters out unreliable reviews based on inconsistencies between ratings and comments; Multi-language support, enabling the extraction and translation of reviews from various languages without relying on HTML scraping; and Fake review detection, achieved by integrating a trained NLP model to identify and distinguish between genuine and fake reviews.
