Table of Contents
Fetching ...

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval

Haolun Wu, Ofer Meshi, Masrour Zoghi, Fernando Diaz, Xue Liu, Craig Boutilier, Maryam Karimzadehgan

TL;DR

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval introduces DUR, a per-user GP-based model that captures dynamic, multi-region interests with quantified uncertainty. The approach uses GP posteriors to score and retrieve Top-N items via Thompson sampling or UCB, with item embeddings learned through a category-aware pre-training objective. Offline results across Amazon, MovieLens, and Taobao show improved coverage and relevance, while online simulations demonstrate that uncertainty-guided exploration can enhance discovery of niche interests. The work highlights effective, low-dimensional representations for scalable two-stage retrieval pipelines and offers a principled framework for uncertainty-aware candidate generation.

Abstract

Accurate modeling of the diverse and dynamic interests of users remains a significant challenge in the design of personalized recommender systems. Existing user modeling methods, like single-point and multi-point representations, have limitations w.r.t.\ accuracy, diversity, and adaptability. To overcome these deficiencies, we introduce density-based user representations (DURs), a novel method that leverages Gaussian process regression (GPR) for effective multi-interest recommendation and retrieval. Our approach, GPR4DUR, exploits DURs to capture user interest variability without manual tuning, incorporates uncertainty-awareness, and scales well to large numbers of users. Experiments using real-world offline datasets confirm the adaptability and efficiency of GPR4DUR, while online experiments with simulated users demonstrate its ability to address the exploration-exploitation trade-off by effectively utilizing model uncertainty.

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval

TL;DR

Density-based User Representation using Gaussian Process Regression for Multi-interest Personalized Retrieval introduces DUR, a per-user GP-based model that captures dynamic, multi-region interests with quantified uncertainty. The approach uses GP posteriors to score and retrieve Top-N items via Thompson sampling or UCB, with item embeddings learned through a category-aware pre-training objective. Offline results across Amazon, MovieLens, and Taobao show improved coverage and relevance, while online simulations demonstrate that uncertainty-guided exploration can enhance discovery of niche interests. The work highlights effective, low-dimensional representations for scalable two-stage retrieval pipelines and offers a principled framework for uncertainty-aware candidate generation.

Abstract

Accurate modeling of the diverse and dynamic interests of users remains a significant challenge in the design of personalized recommender systems. Existing user modeling methods, like single-point and multi-point representations, have limitations w.r.t.\ accuracy, diversity, and adaptability. To overcome these deficiencies, we introduce density-based user representations (DURs), a novel method that leverages Gaussian process regression (GPR) for effective multi-interest recommendation and retrieval. Our approach, GPR4DUR, exploits DURs to capture user interest variability without manual tuning, incorporates uncertainty-awareness, and scales well to large numbers of users. Experiments using real-world offline datasets confirm the adaptability and efficiency of GPR4DUR, while online experiments with simulated users demonstrate its ability to address the exploration-exploitation trade-off by effectively utilizing model uncertainty.
Paper Structure (38 sections, 14 equations, 6 figures, 10 tables)

This paper contains 38 sections, 14 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: The t-SNE visualization of the prediction score between a picked user to all items in the MovieLens dataset. The score is computed as the inner-product between the user embedding and item embedding. The triangles ($\blacktriangle$) indicate the latest 20 items interacted by the user. We use Matrix Factorization (MF) to obtain embeddings in this toy example. As depicted, only density-based method (bottom row) can well capture user interests with uncertainty.
  • Figure 2: The architecture of GPR4DUR: an example of a movie recommendation for a single user.
  • Figure 3: Methods comparison across different user groups on MovieLens. Best viewed in color.
  • Figure 4: Robustness comparison across different dimension sizes on Amazon. Best viewed in color.
  • Figure 5: Illustration of Gaussian Process Regression in 1D. The true function is shown in red, observations are marked with black crosses, and the dashed lines represent two samples from the GP posterior. The dash-dot line represents the posterior mean, while the shaded region indicates the 95% confidence interval, showcasing the uncertainty associated with the GP predictions.
  • ...and 1 more figures