Were You Helpful -- Predicting Helpful Votes from Amazon Reviews
Emin Kirimlioglu, Harrison Kung, Dominic Orlando
TL;DR
The paper tackles predicting which Amazon product reviews will be deemed helpful by focusing on metadata signals rather than textual sentiment. It frames a binary classification task (at least one helpful vote) and finds that three metadata features—the reviewer’s average helpful votes, the number of images in the review, and the review timestamp—provide the strongest predictive power, outperforming TextBlob-based sentiment features. Across models, non-sequential MLP architectures achieve the highest accuracy, with the best result of $0.9691$ accuracy achieved by the MLP-64-deep using AdamW, while sequential models underperform. The findings imply that user behavior and presentation cues are more indicative of perceived helpfulness than linguistic content, with practical implications for ranking and surfacing helpful reviews in e-commerce platforms.
Abstract
This project investigates factors that influence the perceived helpfulness of Amazon product reviews through machine learning techniques. After extensive feature analysis and correlation testing, we identified key metadata characteristics that serve as strong predictors of review helpfulness. While we initially explored natural language processing approaches using TextBlob for sentiment analysis, our final model focuses on metadata features that demonstrated more significant correlations, including the number of images per review, reviewer's historical helpful votes, and temporal aspects of the review. The data pipeline encompasses careful preprocessing and feature standardization steps to prepare the input for model training. Through systematic evaluation of different feature combinations, we discovered that metadata elements we choose using a threshold provide reliable signals when combined for predicting how helpful other Amazon users will find a review. This insight suggests that contextual and user-behavioral factors may be more indicative of review helpfulness than the linguistic content itself.
