Table of Contents
Fetching ...

A hierarchy tree data structure for behavior-based user segment representation

Yang Liu, Xuejiao Kang, Sathya Iyer, Idris Malik, Ruixuan Li, Juan Wang, Xinchen Lu, Xiangxue Zhao, Dayong Wang, Menghan Liu, Isaac Liu, Feng Liang, Yinzhe Yu

TL;DR

This study represents the first list-wise learning-to-rank framework for tree-based recommendation that effectively integrates diverse user categorical attributes while preserving real-world semantic interpretability at a large industrial scale.

Abstract

User attributes are essential in multiple stages of modern recommendation systems and are particularly important for mitigating the cold-start problem and improving the experience of new or infrequent users. We propose Behavior-based User Segmentation (BUS), a novel tree-based data structure that hierarchically segments the user universe with various users' categorical attributes based on the users' product-specific engagement behaviors. During the BUS tree construction, we use Normalized Discounted Cumulative Gain (NDCG) as the objective function to maximize the behavioral representativeness of marginal users relative to active users in the same segment. The constructed BUS tree undergoes further processing and aggregation across the leaf nodes and internal nodes, allowing the generation of popular social content and behavioral patterns for each node in the tree. To further mitigate bias and improve fairness, we use the social graph to derive the user's connection-based BUS segments, enabling the combination of behavioral patterns extracted from both the user's own segment and connection-based segments as the connection aware BUS-based recommendation. Our offline analysis shows that the BUS-based retrieval significantly outperforms traditional user cohort-based aggregation on ranking quality. We have successfully deployed our data structure and machine learning algorithm and tested it with various production traffic serving billions of users daily, achieving statistically significant improvements in the online product metrics, including music ranking and email notifications. To the best of our knowledge, our study represents the first list-wise learning-to-rank framework for tree-based recommendation that effectively integrates diverse user categorical attributes while preserving real-world semantic interpretability at a large industrial scale.

A hierarchy tree data structure for behavior-based user segment representation

TL;DR

This study represents the first list-wise learning-to-rank framework for tree-based recommendation that effectively integrates diverse user categorical attributes while preserving real-world semantic interpretability at a large industrial scale.

Abstract

User attributes are essential in multiple stages of modern recommendation systems and are particularly important for mitigating the cold-start problem and improving the experience of new or infrequent users. We propose Behavior-based User Segmentation (BUS), a novel tree-based data structure that hierarchically segments the user universe with various users' categorical attributes based on the users' product-specific engagement behaviors. During the BUS tree construction, we use Normalized Discounted Cumulative Gain (NDCG) as the objective function to maximize the behavioral representativeness of marginal users relative to active users in the same segment. The constructed BUS tree undergoes further processing and aggregation across the leaf nodes and internal nodes, allowing the generation of popular social content and behavioral patterns for each node in the tree. To further mitigate bias and improve fairness, we use the social graph to derive the user's connection-based BUS segments, enabling the combination of behavioral patterns extracted from both the user's own segment and connection-based segments as the connection aware BUS-based recommendation. Our offline analysis shows that the BUS-based retrieval significantly outperforms traditional user cohort-based aggregation on ranking quality. We have successfully deployed our data structure and machine learning algorithm and tested it with various production traffic serving billions of users daily, achieving statistically significant improvements in the online product metrics, including music ranking and email notifications. To the best of our knowledge, our study represents the first list-wise learning-to-rank framework for tree-based recommendation that effectively integrates diverse user categorical attributes while preserving real-world semantic interpretability at a large industrial scale.

Paper Structure

This paper contains 15 sections, 4 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: User cohorts size. (A) User distribution. (B) Cohort distribution
  • Figure 2: BUS tree construction: (A) The regress operator generates a regress node, shown as a square; (B) Example of user attribute selection in BUS tree construction
  • Figure 3: BUS-based recommendation. (A) A constructed BUS tree undergoes regress node removal and leaf node unnest. (B) Connection aware BUS-based recommendation where content retrieved from user's own segment and connection segments.
  • Figure 4: System overview of BUS-based recommendation.
  • Figure 5: Offline evaluation. (A) Overall loss in the BUS tree construction. (B) The number of segments and regress operators in the BUS tree construction.
  • ...and 1 more figures