Image Fusion for Cross-Domain Sequential Recommendation
Wangyu Wu, Siqi Song, Xianglin Qiu, Xiaowei Huang, Fei Ma, Jimin Xiao
TL;DR
This work tackles Cross-Domain Sequential Recommendation (CDSR) by addressing domain bias and the underutilization of visual item representations. It introduces IFCDSR, which fuses frozen CLIP derived image embeddings with learnable item ID embeddings and processes three cross-domain sequences through a multi-attention architecture to capture both intra-domain and cross-domain user preferences. The approach jointly learns single-domain and cross-domain interests, and predictions are made by combining ID and image based similarities in a fused probability. Experiments on re-partitioned Amazon CDSR datasets show IFCDSR achieving state-of-the-art results, demonstrating the practical benefit of incorporating visual signals into cross-domain recommendation systems.
Abstract
Cross-Domain Sequential Recommendation (CDSR) aims to predict future user interactions based on historical interactions across multiple domains. The key challenge in CDSR is effectively capturing cross-domain user preferences by fully leveraging both intra-sequence and inter-sequence item interactions. In this paper, we propose a novel method, Image Fusion for Cross-Domain Sequential Recommendation (IFCDSR), which incorporates item image information to better capture visual preferences. Our approach integrates a frozen CLIP model to generate image embeddings, enriching original item embeddings with visual data from both intra-sequence and inter-sequence interactions. Additionally, we employ a multiple attention layer to capture cross-domain interests, enabling joint learning of single-domain and cross-domain user preferences. To validate the effectiveness of IFCDSR, we re-partitioned four e-commerce datasets and conducted extensive experiments. Results demonstrate that IFCDSR significantly outperforms existing methods.
