LD4MRec: Simplifying and Powering Diffusion Model for Multimedia Recommendation
Jiarui Zhu, Jun Hou, Penghang Yu, Zhiyi Tan, Bing-Kun Bao
TL;DR
This work addresses the challenge of noise in observed user behaviors for multimedia recommendation by proposing LD4MRec, a Light Diffusion model that enables real-time, forward-free inference. A Conditional neural Network (C-Net) guides generation using two signals: collaborative signals and personalized modality preference signals, with semi-supervised soft reconstruction to distill stable user preferences. The model is validated on three real-world datasets, demonstrating superior predictive performance and significant inference-time reductions compared with prior diffusion-based approaches. The approach offers practical improvements in robustness to noisy data and efficiency for deployment in real-time recommender systems.
Abstract
Multimedia recommendation aims to predict users' future behaviors based on observed behaviors and item content information. However, the inherent noise contained in observed behaviors easily leads to suboptimal recommendation performance. Recently, the diffusion model's ability to generate information from noise presents a promising solution to this issue, prompting us to explore its application in multimedia recommendation. Nonetheless, several challenges must be addressed: 1) The diffusion model requires simplification to meet the efficiency requirements of real-time recommender systems, 2) The generated behaviors must align with user preference. To address these challenges, we propose a Light Diffusion model for Multimedia Recommendation (LD4MRec). LD4MRec largely reduces computational complexity by employing a forward-free inference strategy, which directly predicts future behaviors from observed noisy behaviors. Meanwhile, to ensure the alignment between generated behaviors and user preference, we propose a novel Conditional neural Network (C-Net). C-Net achieves guided generation by leveraging two key signals, collaborative signals and personalized modality preference signals, thereby improving the semantic consistency between generated behaviors and user preference. Experiments conducted on three real-world datasets demonstrate the effectiveness of LD4MRec.
