Improved Diversity-Promoting Collaborative Metric Learning for Recommendation
Shilong Bao, Qianqian Xu, Zhiyong Yang, Yuan He, Xiaochun Cao, Qingming Huang
TL;DR
This work tackles the bias that emerges in traditional Collaborative Metric Learning when users have multiple, unevenly distributed interests. It introduces Diversity-Promoting Collaborative Metric Learning (DPCML), which uses multiple representations per user and a Diversity Control Regularization Scheme to balance diversity with generalization. The paper connects HarS-based negative sampling to OPAUC optimization, and then offers a Differentiable Hardness-aware Sampling (DiHarS) strategy to maximize OPAUC within a practical false-positive range, improving Top-N recommendations. A theoretical generalization bound shows DPCML can generalize better than standard CML, and empirical results on six benchmarks demonstrate improved accuracy and diversification, with notable gains from APA and DiHarS. The approach provides a scalable, diversity-aware alternative within the CML framework, with potential extensions to joint accessibility and content-enhanced settings.
Abstract
Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and collaborative filtering. Following the convention of RS, existing practices exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to introduce a set of multiple representations for each user in the system where users' preference toward an item is aggregated by taking the minimum item-user distance among their embedding set. Specifically, we instantiate two effective assignment strategies to explore a proper quantity of vectors for each user. Meanwhile, a \textit{Diversity Control Regularization Scheme} (DCRS) is developed to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could induce a smaller generalization error than traditional CML. Furthermore, we notice that CML-based approaches usually require \textit{negative sampling} to reduce the heavy computational burden caused by the pairwise objective therein. In this paper, we reveal the fundamental limitation of the widely adopted hard-aware sampling from the One-Way Partial AUC (OPAUC) perspective and then develop an effective sampling alternative for the CML-based paradigm. Finally, comprehensive experiments over a range of benchmark datasets speak to the efficacy of DPCML. Code are available at \url{https://github.com/statusrank/LibCML}.
