Table of Contents
Fetching ...

Improved Diversity-Promoting Collaborative Metric Learning for Recommendation

Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang

TL;DR

This work tackles the bias that emerges in traditional Collaborative Metric Learning when users have multiple, unevenly distributed interests. It introduces Diversity-Promoting Collaborative Metric Learning (DPCML), which uses multiple representations per user and a Diversity Control Regularization Scheme to balance diversity with generalization. The paper connects HarS-based negative sampling to OPAUC optimization, and then offers a Differentiable Hardness-aware Sampling (DiHarS) strategy to maximize OPAUC within a practical false-positive range, improving Top-N recommendations. A theoretical generalization bound shows DPCML can generalize better than standard CML, and empirical results on six benchmarks demonstrate improved accuracy and diversification, with notable gains from APA and DiHarS. The approach provides a scalable, diversity-aware alternative within the CML framework, with potential extensions to joint accessibility and content-enhanced settings.

Abstract

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a \textit{Diversity Control Regularization Scheme} (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require \textit{negative sampling} to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML. Code are available at \url{https://github.com/statusrank/LibCML}.

Improved Diversity-Promoting Collaborative Metric Learning for Recommendation

TL;DR

This work tackles the bias that emerges in traditional Collaborative Metric Learning when users have multiple, unevenly distributed interests. It introduces Diversity-Promoting Collaborative Metric Learning (DPCML), which uses multiple representations per user and a Diversity Control Regularization Scheme to balance diversity with generalization. The paper connects HarS-based negative sampling to OPAUC optimization, and then offers a Differentiable Hardness-aware Sampling (DiHarS) strategy to maximize OPAUC within a practical false-positive range, improving Top-N recommendations. A theoretical generalization bound shows DPCML can generalize better than standard CML, and empirical results on six benchmarks demonstrate improved accuracy and diversification, with notable gains from APA and DiHarS. The approach provides a scalable, diversity-aware alternative within the CML framework, with potential extensions to joint accessibility and content-enhanced settings.

Abstract

Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a \textit{Diversity Control Regularization Scheme} (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require \textit{negative sampling} to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML. Code are available at \url{https://github.com/statusrank/LibCML}.
Paper Structure (72 sections, 14 theorems, 115 equations, 14 figures, 18 tables, 1 algorithm)

This paper contains 72 sections, 14 theorems, 115 equations, 14 figures, 18 tables, 1 algorithm.

Key Result

Lemma 1

DBLP:conf/iclr/LongS20DBLP:conf/icml/LiL21DBLP:journals/jc/Zhou02 The covering number of the hypothesis class $\mathcal{H}_R$ has the following upper bound: where $d$ is the dimension of embedding space.

Figures (14)

  • Figure 1: An illustration shows the benefit of DPCML when a user has multiple diverse preferences. Taking movies as an example, we assume that Sci-Fi/Horror is the majority/minority interest of the user while Cartoon is an irrelevant movie type. It is easy to see that if the item embeddings are distributed like depicted in the figure, we can hardly find a single user embedding to capture both interests simultaneously.
  • Figure 2: Motivating visualizations on MovieLens-1M and MovieLens-10M datasets, where (a), (b) are the statistics of users' preference diversity and (c), (d) are the item category distribution, respectively.
  • Figure 3: The relationship between the users' interest diversity and their interaction lengths.
  • Figure 4: Diversity vs. performance on Steam-200k.
  • Figure 5: Ablation studies for sampling parameters on Steam-200k dataset.
  • ...and 9 more figures

Theorems & Definitions (31)

  • Definition 1: Preference Diversity
  • Remark 1
  • Definition 2: $\epsilon$-Covering
  • Definition 3: Covering Number
  • Lemma 1
  • Theorem 1: Generalization Upper Bound of DPCML
  • Corollary 1
  • Remark 2
  • Proposition 1: Equivalent Reformulation of Generic HarS-based CML Framework
  • Remark 3
  • ...and 21 more