TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content
Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao
TL;DR
TruthSR tackles noise in user-generated multimodal content for sequential recommendation by explicitly modeling cross-modal consistency and complementarity while quantifying prediction uncertainty. It introduces a four-component architecture—embedding, modality perception, multi-view sequence, and trustworthy decision modules—together with a Dirichlet/evidence-based fusion of two views (ID and multi-modal) via $\\alpha_{v,i}=e_{v,i}+1$ and $\,\mathcal{M}={\mathcal{M}_1}\oplus{\mathcal{M}_2}$, plus a text-image alignment loss $\\mathcal{L}_{tv}$. Key contributions include explicit mining of user multi-modal sequential preferences, a cross-modal alignment mechanism, and an adaptive trust framework that estimates both belief and uncertainty through a bound-preserving loss. Empirical results on four real-world datasets show that TruthSR outperforms state-of-the-art baselines and remains robust to noisy UGC, highlighting its practical impact for reliable and personalized recommendations.
Abstract
Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal inconsistent information. However, they could potentially constrain the capturing of personalized user preferences. In addition, it is almost impossible to entirely eliminate noise in diverse user-generated multi-modal content. To solve these problems, we propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content. Specifically, we explicitly capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference. We also achieve the modeling of the user's multi-modal sequential preferences. In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective to dynamically evaluate the uncertainty of prediction results. Experimental evaluation on four widely-used datasets demonstrates the superior performance of our model compared to state-of-the-art methods. The code is released at https://github.com/FairyMeng/TrustSR.
