MultiCBR: Multi-view Contrastive Learning for Bundle Recommendation
Yunshan Ma, Yingzhi He, Xiang Wang, Yinwei Wei, Xiaoyu Du, Yuyangzi Fu, Tat-Seng Chua
TL;DR
MultiCBR advances bundle recommendation by introducing a three-view (UB, UI, BI) representation learning framework and an early fusion–late contrast strategy. By fusing the three view-specific embeddings before applying self-supervised contrast, it captures both cross-view and ego-view user preferences while using only two contrastive losses, improving efficiency. Empirical results on three benchmark datasets show consistent gains over state-of-the-art methods, with particularly large improvements on highly BI-sparse data like iFashion. The work also provides extensive ablations and model studies that validate the BI view's role in addressing BI sparsity and the effectiveness of the fusion design for multi-view collaboration.
Abstract
Bundle recommendation seeks to recommend a bundle of related items to users to improve both user experience and the profits of platform. Existing bundle recommendation models have progressed from capturing only user-bundle interactions to the modeling of multiple relations among users, bundles and items. CrossCBR, in particular, incorporates cross-view contrastive learning into a two-view preference learning framework, significantly improving SOTA performance. It does, however, have two limitations: 1) the two-view formulation does not fully exploit all the heterogeneous relations among users, bundles and items; and 2) the "early contrast and late fusion" framework is less effective in capturing user preference and difficult to generalize to multiple views. In this paper, we present MultiCBR, a novel Multi-view Contrastive learning framework for Bundle Recommendation. First, we devise a multi-view representation learning framework capable of capturing all the user-bundle, user-item and bundle-item relations, especially better utilizing the bundle-item affiliations to enhance sparse bundles' representations. Second, we innovatively adopt an "early fusion and late contrast" design that first fuses the multi-view representations before performing self-supervised contrastive learning. In comparison to existing approaches, our framework reverses the order of fusion and contrast, introducing the following advantages: 1)our framework is capable of modeling both cross-view and ego-view preferences, allowing us to achieve enhanced user preference modeling; and 2) instead of requiring quadratic number of cross-view contrastive losses, we only require two self-supervised contrastive losses, resulting in minimal extra costs. Experimental results on three public datasets indicate that our method outperforms SOTA methods.
