Table of Contents
Fetching ...

Sequences as Nodes for Contrastive Multimodal Graph Recommendation

Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield

TL;DR

MuSICRec tackles cold-start and data sparsity by unifying collaborative, sequential, and multimodal signals within a multi-view graph framework. It introduces a sequence-item (SI) graph as an organical alternative view and uses ID-guided gating to calibrate cross-modal information, with cross-view contrastive objectives to align views. The model integrates a UI graph, an SI graph, and a frozen multimodal item-item graph, enabling robust propagation and fusion across signals. Empirically, MuSICRec outperforms state-of-the-art baselines on Amazon Baby, Sports, and Electronics datasets, with the largest gains for users with short histories, indicating improved robustness to sparsity and cold-start conditions.

Abstract

To tackle cold-start and data sparsity issues in recommender systems, numerous multimodal, sequential, and contrastive techniques have been proposed. While these augmentations can boost recommendation performance, they tend to add noise and disrupt useful semantics. To address this, we propose MuSICRec (Multimodal Sequence-Item Contrastive Recommender), a multi-view graph-based recommender that combines collaborative, sequential, and multimodal signals. We build a sequence-item (SI) view by attention pooling over the user's interacted items to form sequence nodes. We propagate over the SI graph, obtaining a second view organically as an alternative to artificial data augmentation, while simultaneously injecting sequential context signals. Additionally, to mitigate modality noise and align the multimodal information, the contribution of text and visual features is modulated according to an ID-guided gate. We evaluate under a strict leave-two-out split against a broad range of sequential, multimodal, and contrastive baselines. On the Amazon Baby, Sports, and Electronics datasets, MuSICRec outperforms state-of-the-art baselines across all model types. We observe the largest gains for short-history users, mitigating sparsity and cold-start challenges. Our code is available at https://anonymous.4open.science/r/MuSICRec-3CEE/ and will be made publicly available.

Sequences as Nodes for Contrastive Multimodal Graph Recommendation

TL;DR

MuSICRec tackles cold-start and data sparsity by unifying collaborative, sequential, and multimodal signals within a multi-view graph framework. It introduces a sequence-item (SI) graph as an organical alternative view and uses ID-guided gating to calibrate cross-modal information, with cross-view contrastive objectives to align views. The model integrates a UI graph, an SI graph, and a frozen multimodal item-item graph, enabling robust propagation and fusion across signals. Empirically, MuSICRec outperforms state-of-the-art baselines on Amazon Baby, Sports, and Electronics datasets, with the largest gains for users with short histories, indicating improved robustness to sparsity and cold-start conditions.

Abstract

To tackle cold-start and data sparsity issues in recommender systems, numerous multimodal, sequential, and contrastive techniques have been proposed. While these augmentations can boost recommendation performance, they tend to add noise and disrupt useful semantics. To address this, we propose MuSICRec (Multimodal Sequence-Item Contrastive Recommender), a multi-view graph-based recommender that combines collaborative, sequential, and multimodal signals. We build a sequence-item (SI) view by attention pooling over the user's interacted items to form sequence nodes. We propagate over the SI graph, obtaining a second view organically as an alternative to artificial data augmentation, while simultaneously injecting sequential context signals. Additionally, to mitigate modality noise and align the multimodal information, the contribution of text and visual features is modulated according to an ID-guided gate. We evaluate under a strict leave-two-out split against a broad range of sequential, multimodal, and contrastive baselines. On the Amazon Baby, Sports, and Electronics datasets, MuSICRec outperforms state-of-the-art baselines across all model types. We observe the largest gains for short-history users, mitigating sparsity and cold-start challenges. Our code is available at https://anonymous.4open.science/r/MuSICRec-3CEE/ and will be made publicly available.
Paper Structure (38 sections, 29 equations, 7 figures, 3 tables)

This paper contains 38 sections, 29 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 2: MuSICRec Architecture
  • Figure 3: Sports Ablation
  • Figure 4: $\lambda_u$Sports Sensitivity
  • Figure 5: $\lambda_i$Sports Sensitivity
  • Figure 6: Sports User History Depth Analysis
  • ...and 2 more figures