Table of Contents
Fetching ...

TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao

TL;DR

TruthSR tackles noise in user-generated multimodal content for sequential recommendation by explicitly modeling cross-modal consistency and complementarity while quantifying prediction uncertainty. It introduces a four-component architecture—embedding, modality perception, multi-view sequence, and trustworthy decision modules—together with a Dirichlet/evidence-based fusion of two views (ID and multi-modal) via $\\alpha_{v,i}=e_{v,i}+1$ and $\,\mathcal{M}={\mathcal{M}_1}\oplus{\mathcal{M}_2}$, plus a text-image alignment loss $\\mathcal{L}_{tv}$. Key contributions include explicit mining of user multi-modal sequential preferences, a cross-modal alignment mechanism, and an adaptive trust framework that estimates both belief and uncertainty through a bound-preserving loss. Empirical results on four real-world datasets show that TruthSR outperforms state-of-the-art baselines and remains robust to noisy UGC, highlighting its practical impact for reliable and personalized recommendations.

Abstract

Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal inconsistent information. However, they could potentially constrain the capturing of personalized user preferences. In addition, it is almost impossible to entirely eliminate noise in diverse user-generated multi-modal content. To solve these problems, we propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content. Specifically, we explicitly capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference. We also achieve the modeling of the user's multi-modal sequential preferences. In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective to dynamically evaluate the uncertainty of prediction results. Experimental evaluation on four widely-used datasets demonstrates the superior performance of our model compared to state-of-the-art methods. The code is released at https://github.com/FairyMeng/TrustSR.

TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

TL;DR

TruthSR tackles noise in user-generated multimodal content for sequential recommendation by explicitly modeling cross-modal consistency and complementarity while quantifying prediction uncertainty. It introduces a four-component architecture—embedding, modality perception, multi-view sequence, and trustworthy decision modules—together with a Dirichlet/evidence-based fusion of two views (ID and multi-modal) via and , plus a text-image alignment loss . Key contributions include explicit mining of user multi-modal sequential preferences, a cross-modal alignment mechanism, and an adaptive trust framework that estimates both belief and uncertainty through a bound-preserving loss. Empirical results on four real-world datasets show that TruthSR outperforms state-of-the-art baselines and remains robust to noisy UGC, highlighting its practical impact for reliable and personalized recommendations.

Abstract

Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal inconsistent information. However, they could potentially constrain the capturing of personalized user preferences. In addition, it is almost impossible to entirely eliminate noise in diverse user-generated multi-modal content. To solve these problems, we propose a trustworthy sequential recommendation method via noisy user-generated multi-modal content. Specifically, we explicitly capture the consistency and complementarity of user-generated multi-modal content to mitigate noise interference. We also achieve the modeling of the user's multi-modal sequential preferences. In addition, we design a trustworthy decision mechanism that integrates subjective user perspective and objective item perspective to dynamically evaluate the uncertainty of prediction results. Experimental evaluation on four widely-used datasets demonstrates the superior performance of our model compared to state-of-the-art methods. The code is released at https://github.com/FairyMeng/TrustSR.
Paper Structure (23 sections, 17 equations, 3 figures, 4 tables)

This paper contains 23 sections, 17 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Illustration of interference information in UGC. User review on item 2 is missing; user review on item 3 is noisy and irrelevant. The interference information affects the selection of the next item.
  • Figure 2: Illustration of TruthSR. (a) shows that the model consists of embedding module, modality perception module, multi-view sequence module and trustworthy decision module. (b) shows noise reduction by capturing the consistency of the user multi-modal preferences. (c) shows the generation of trusted recommendation results and corresponding confidence levels.
  • Figure 3: impact of Hyper-parameter