Modality Reliability Guided Multimodal Recommendation
Xue Dong, Xuemeng Song, Na Zheng, Sicheng Zhao, Guiguang Ding
TL;DR
This work tackles the degradation of multimodal recommendations caused by unreliable modality data. It introduces MARGO, a modality reliability guided framework that learns modality weights under supervision derived from modality-level differences in positive versus negative item predictions within the BPR objective, complemented by a dynamic weight calibration loss and a two-stage training procedure. The approach yields statistically consistent improvements over state-of-the-art baselines on three Amazon datasets, with insights into modality contributions and weight behavior. The findings demonstrate that reliability-guided fusion can effectively mitigate modality noise and enhance personalization in multimodal recommender systems.
Abstract
Multimodal recommendation faces an issue of the performance degradation that the uni-modal recommendation sometimes achieves the better performance. A possible reason is that the unreliable item modality data hurts the fusion result. Several existing studies have introduced weights for different modalities to reduce the contribution of the unreliable modality data in predicting the final user rating. However, they fail to provide appropriate supervisions for learning the modality weights, making the learned weights imprecise. Therefore, we propose a modality reliability guided multimodal recommendation framework that uniquely learns the modality weights supervised by the modality reliability. Considering that there is no explicit label provided for modality reliability, we resort to automatically identify it through the BPR recommendation objective. In particular, we define a modality reliability vector as the supervision label by the difference between modality-specific user ratings to positive and negative items, where a larger difference indicates a higher reliability of the modality as the BPR objective is better satisfied. Furthermore, to enhance the effectiveness of the supervision, we calculate the confidence level for the modality reliability vector, which dynamically adjusts the supervision strength and eliminates the harmful supervision. Extensive experiments on three real-world datasets show the effectiveness of the proposed method.
