Table of Contents
Fetching ...

Preserving Individuality while Following the Crowd: Understanding the Role of User Taste and Crowd Wisdom in Online Product Rating Prediction

Liang Wang, Shubham Jain, Yingtong Dou, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Prince Aboagye, Yan Zheng, Xin Dai, Zhongfang Zhuang, Uday Singh Saini, Wei Zhang

TL;DR

This work proposes a unique and practical approach that emphasizes historical ratings at both the user and product levels, encapsulated using a continuously updated dynamic tree representation, and develops an efficient data processing strategy that makes this approach highly scalable and easily deployable.

Abstract

Numerous algorithms have been developed for online product rating prediction, but the specific influence of user and product information in determining the final prediction score remains largely unexplored. Existing research often relies on narrowly defined data settings, which overlooks real-world challenges such as the cold-start problem, cross-category information utilization, and scalability and deployment issues. To delve deeper into these aspects, and particularly to uncover the roles of individual user taste and collective wisdom, we propose a unique and practical approach that emphasizes historical ratings at both the user and product levels, encapsulated using a continuously updated dynamic tree representation. This representation effectively captures the temporal dynamics of users and products, leverages user information across product categories, and provides a natural solution to the cold-start problem. Furthermore, we have developed an efficient data processing strategy that makes this approach highly scalable and easily deployable. Comprehensive experiments in real industry settings demonstrate the effectiveness of our approach. Notably, our findings reveal that individual taste dominates over collective wisdom in online product rating prediction, a perspective that contrasts with the commonly observed wisdom of the crowd phenomenon in other domains. This dominance of individual user taste is consistent across various model types, including the boosting tree model, recurrent neural network (RNN), and transformer-based architectures. This observation holds true across the overall population, within individual product categories, and in cold-start scenarios. Our findings underscore the significance of individual user tastes in the context of online product rating prediction and the robustness of our approach across different model architectures.

Preserving Individuality while Following the Crowd: Understanding the Role of User Taste and Crowd Wisdom in Online Product Rating Prediction

TL;DR

This work proposes a unique and practical approach that emphasizes historical ratings at both the user and product levels, encapsulated using a continuously updated dynamic tree representation, and develops an efficient data processing strategy that makes this approach highly scalable and easily deployable.

Abstract

Numerous algorithms have been developed for online product rating prediction, but the specific influence of user and product information in determining the final prediction score remains largely unexplored. Existing research often relies on narrowly defined data settings, which overlooks real-world challenges such as the cold-start problem, cross-category information utilization, and scalability and deployment issues. To delve deeper into these aspects, and particularly to uncover the roles of individual user taste and collective wisdom, we propose a unique and practical approach that emphasizes historical ratings at both the user and product levels, encapsulated using a continuously updated dynamic tree representation. This representation effectively captures the temporal dynamics of users and products, leverages user information across product categories, and provides a natural solution to the cold-start problem. Furthermore, we have developed an efficient data processing strategy that makes this approach highly scalable and easily deployable. Comprehensive experiments in real industry settings demonstrate the effectiveness of our approach. Notably, our findings reveal that individual taste dominates over collective wisdom in online product rating prediction, a perspective that contrasts with the commonly observed wisdom of the crowd phenomenon in other domains. This dominance of individual user taste is consistent across various model types, including the boosting tree model, recurrent neural network (RNN), and transformer-based architectures. This observation holds true across the overall population, within individual product categories, and in cold-start scenarios. Our findings underscore the significance of individual user tastes in the context of online product rating prediction and the robustness of our approach across different model architectures.
Paper Structure (21 sections, 9 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: "Product Tree 1" and "Product Tree 2" are designed to encapsulate historical product ratings, reflecting the influence of crowd wisdom. On the other hand, the "User Tree" concentrates on historical user ratings, signifying individual influences. The key distinction between the two product trees lies in "Product Tree 2"s consideration of user ratings across all product categories when ratings for the current product are not available. These trees are continuously updated at each time point $t$, with varying lengths of look-back windows, to capture the temporal dynamics of both users and products.
  • Figure 2: Time needed with and without daily aggregation.
  • Figure 3: AUC values of individual trees with varying look-back window sizes.
  • Figure 4: This figure compares the performance of two modeling approaches: one using a single model for all categories (solid lines) and another employing 29 separate models, each for a specific product category (dotted lines). The four different colors represent four distinct settings: $\mathbf{S1}$, $\mathbf{S2}$, $\mathbf{S3}$, and $\mathbf{S4}$. The suffix "_s" signifies the single model approach, while the suffix "_m" indicates the multiple model approach.
  • Figure 5: This figure displays the AUC values of LightGBM models under four different settings, $\mathbf{S1}$, $\mathbf{S2}$, $\mathbf{S3}$, and $\mathbf{S4}$, across 29 individual product categories. The values in parentheses represent the AUC values for the entire product categories.
  • ...and 4 more figures